Editing
Regular expression in Mandarin
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Find non-ASCII characters 尋找中文、非英文的文字 === ==== Find non-ASCII characters in Google sheet ==== 適用: Google Drive 試算表的 Regular expression 相關函數,例如: [https://support.google.com/docs/answer/3098292?hl=zh-Hant REGEXMATCH]、[https://support.google.com/docs/answer/3098244?hl=en REGEXEXTRACT]、[https://support.google.com/docs/answer/3098245?hl=zh-Hant RegExReplace] 函數、Notepad++的搜尋 <pre> [^\x00-\x80]+ </pre> ==== Find non-ASCII characters in LibreOffice ==== 適用: [https://zh-tw.libreoffice.org/ LibreOffice] [https://help.libreoffice.org/6.2/en-US/text/scalc/01/func_regex.html REGEX] function<ref>[https://help.libreoffice.org/6.2/en-US/text/shared/01/02100001.html?&DbPAR=WRITER&System=MAC List of Regular Expressions]</ref>、Total commander 的 Multi-Rename tool<ref>取代非英文的文字,但是不包含 . 符號: <nowiki>[^\u0000-\u0080|.]+ </nowiki></ref><ref>[http://stackoverflow.com/questions/150033/regular-expression-to-match-non-english-characters javascript - Regular expression to match non-english characters? - Stack Overflow]</ref> <pre> [^\u0000-\u0080]+ </pre> ==== Find Chinese characters in Google sheet ==== 範例:如果 A2 包含任一中文字,則欄位值顯示「中文」。如果未包含任何中文字,則欄位值顯示「英文」: <pre> =IF(REGEXMATCH(A2, "[\一-\龥]"), "中文", "英文") </pre> {{exclaim}} Google 不支援以下語法,會顯示「... 是無效的規則運算式。」錯誤 * {{kbd | key=<nowiki>[\u4e00-\u9fa5]</nowiki>}} * {{kbd | key=<nowiki>[^\u4e00-\u9fa5]</nowiki>}} * {{kbd | key=<nowiki>[\p{Script=Hans}]</nowiki>}} * {{kbd | key=<nowiki>[\p{Han}]</nowiki>}} ==== Find Chinese characters in MySQL ==== 尋找 `column_name` 欄位值包含中文字。適用: MySQL<ref>[https://stackoverflow.com/questions/9795137/how-to-detect-rows-with-chinese-characters-in-mysql How to detect rows with chinese characters in MySQL? - Stack Overflow]</ref><ref>[https://stackoverflow.com/questions/401771/how-can-i-find-non-ascii-characters-in-mysql How can I find non-ASCII characters in MySQL? - Stack Overflow]</ref> <pre> SELECT `column_name` FROM `table_name` WHERE HEX(`column_name`) REGEXP '^(..)*(E[4-9])'; </pre> 說明 * 正則表達式 '^(..)*(E[4-9])' 的含義是尋找從字符串開始處(表示為 ^),每兩個字符(表示為 ..)重複零次或多次(表示為 *),直到找到一個匹配 (E[4-9]) 的序列。 * 透過加入 ^(..)* 使得搜尋條件更加嚴格,它要求 (E[4-9]) 的出現位置必須是在一個合法的 UTF-8 字符邊界上。這意味著它更可能正確匹配開頭為中文字符的字符串,而忽略那些僅在中間或末尾偶然包含 E4 到 E9 序列的非中文字符串。 ==== Find non-ASCII characters in MySQL ==== 尋找 `column_name` 欄位值不完全是 ASCII 字元 <pre> SELECT `column_name` FROM `table_name` WHERE `column_name` <> CONVERT(`column_name` USING ASCII) </pre> ==== Find non-ASCII characters in PHP ==== 尋找欄位值包含中文字,中文字包含繁體中文與簡體中文,不包含標點符號 (例如 {{kbd | key = <nowiki>,</nowiki>}})、全形標點符號 (例如 {{kbd | key = <nowiki>,</nowiki>}})以及特殊符號,例如 Emoji:{{kbd | key = ⭐}}。 PHP: exact match <pre> // approach 1 if (preg_match('/^[\x{4e00}-\x{9fa5}]+$/u', $string)) { echo "全部文字都是中文字" . PHP_EOL; }else{ echo "部分文字不是中文字" . PHP_EOL; } // approach 2 if (preg_match('/^[\p{Han}]+$/u', $string)) { echo "全部文字都是中文字" . PHP_EOL; }else{ echo "部分文字不是中文字" . PHP_EOL; } </pre> partial match ([http://sandbox.onlinephpfunctions.com/code/d780845d20877c0fd2e693b28ed02a10d250d39e online demo] hosted by [http://sandbox.onlinephpfunctions.com/ PHP Sandbox]) <pre> // approach 1 $string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐'; $pattern = '/[\p{Han}]+/u'; preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE); var_dump($matches); // approach 2 $string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐'; $pattern = '/[\x{4e00}-\x{9fa5}]+/u'; preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE); var_dump($matches); </pre> 技術問題除錯: 錯誤訊息 <pre>preg_match(): Compilation failed: character value in \x{} or \o{} is too large at offset 8</pre> 解決方式: [http://php.net/manual/en/function.preg-match.php preg_match()] 需要加上 {{kbd | key = u }} 變數<ref>[https://stackoverflow.com/questions/32375531/preg-match-compilation-failed-character-value-in-x-or-o-is-too-large-a php - preg_match(): Compilation failed: character value in \x{} or \o{} is too large at offset 27 on line number 25 - Stack Overflow]</ref>。 ==== Find non-ASCII characters in JavaScript ==== * [https://stackoverflow.com/questions/21109011/javascript-unicode-string-chinese-character-but-no-punctuation regex - Javascript unicode string, chinese character but no punctuation - Stack Overflow] 參考資料: * [http://blog.csdn.net/tinyletero/article/details/8201465 unicode编码 \u4e00-\u9fa5 匹配所有中文 - CSDN博客] * [https://stackoverflow.com/questions/38168419/codeigniter-form-validation-for-chinese-words php - CodeIgniter Form Validation for Chinese Words - Stack Overflow] * [https://zh.wikipedia.org/zh-tw/%E4%B8%AD%E6%97%A5%E9%9F%93%E7%B5%B1%E4%B8%80%E8%A1%A8%E6%84%8F%E6%96%87%E5%AD%97%E5%88%97%E8%A1%A8 中日韓統一表意文字列表 - 維基百科,自由的百科全書]
Summary:
Please note that all contributions to LemonWiki共筆 are considered to be released under the Creative Commons Attribution-NonCommercial-ShareAlike (see
LemonWiki:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Current events
Recent changes
Random page
Help
Categories
Tools
What links here
Related changes
Special pages
Page information