Editing
Regular expression
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Find Non-ASCII Characters (Chinese/Non-English Text) === ==== In LibreOffice ==== <pre>[^\u0000-\u0080]+</pre> ==== Find Chinese Characters in Google Sheets ==== Example: If cell {{kbd | key=A2}} contains any Chinese character, display “Chinese”, otherwise display “English”: <pre>=IF(REGEXMATCH(A2, "[\一-\龥]"), "Chinese", "English")</pre> ==== Find Non-ASCII Characters in Google Sheets ==== Extract non-ASCII characters (such as Chinese, Japanese, emoji, etc.) from cell {{kbd | key=A2}} <pre> =IF(ISERROR(REGEXEXTRACT(A2, "[^\x00-\x80]+")), "", REGEXEXTRACT(A2, "[^\x00-\x80]+")) </pre> Explanation of regular expression {{kbd | key=<nowiki>[^\x00-\x80]+</nowiki>}} * {{kbd | key=<nowiki>[\x00-\x80]</nowiki>}}: Represents the ASCII character range (character codes 0-128). (1) Standard ASCII range: 0-127 ({{kbd | key=<nowiki>0x00-0x7F</nowiki>}} aka * {{kbd | key=<nowiki>[\x00-\x7F]</nowiki>}})<ref>[https://www.commfront.com/pages/ascii-chart ASCII Chart – CommFront]</ref> (2) Character 128 (({{kbd | key=<nowiki>0x80</nowiki>}}) is actually the first character in the extended ASCII range, not part of the original ASCII standard.<ref>[https://en.wikipedia.org/wiki/UTF-8 UTF-8 - Wikipedia]</ref><ref>[https://en.wikipedia.org/wiki/Control_character Control character - Wikipedia]</ref> * {{kbd | key=<nowiki>[^...]</nowiki>}}: Means "not" these characters * {{kbd | key=<nowiki>+</nowiki>}}: Means one or more Overall meaning: Matches one or more non-ASCII characters ==== Find Chinese Characters in MySQL ==== Find rows where <code>column_name</code> contains Chinese characters: <pre lang="sql">SELECT `column_name` FROM `table_name` WHERE HEX(`column_name`) REGEXP '^(..)*(E[4-9])';</pre> Query condition used to match records where the <code>column_name</code> field contains only Chinese characters. <pre lang="sql">SELECT `column_name` FROM `table_name` WHERE `column_name` REGEXP '^[一-龯]+$';</pre> Explanation: * {{kbd | key=<nowiki>[一-龯]</nowiki>}} - Character set that matches all characters from "一" to "龯" in Unicode * "一" has Unicode code point {{kbd | key=<nowiki>U+4E00</nowiki>}}<ref>[https://www.compart.com/en/unicode/U+4E00 “一” U+4E00 CJK Unified Ideograph-4E00 Unicode Character]</ref> * "龯" has Unicode code point {{kbd | key=<nowiki>U+9FEF</nowiki>}}<ref>[https://www.compart.com/en/unicode/U+9FAF “龯” U+9FAF CJK Unified Ideograph-9FAF Unicode Character]</ref> * This range U+4E00-U+9FFF already covers over 99% of daily Chinese usage requirements [https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_Extension_B Extension B] and later blocks mainly contain ancient Chinese characters, variant characters, etc., which rarely appear in modern texts ==== Find Non-ASCII Characters in MySQL ==== Find rows where <code>column_name</code> is not entirely ASCII characters: <syntaxhighlight lang="sql">SELECT `column_name` FROM `table_name` WHERE `column_name` <> CONVERT(`column_name` USING ASCII)</syntaxhighlight> ==== Find Chinese Characters in PHP ==== '''Exact match:''' <syntaxhighlight lang="php">// Approach 1 if (preg_match('/^[\x{4e00}-\x{9fa5}]+$/u', $string)) { echo "All text is Chinese characters" . PHP_EOL; } else { echo "Some text is not Chinese characters" . PHP_EOL; } // Approach 2 if (preg_match('/^[\p{Han}]+$/u', $string)) { echo "All text is Chinese characters" . PHP_EOL; } else { echo "Some text is not Chinese characters" . PHP_EOL; }</syntaxhighlight> '''Partial match:''' <syntaxhighlight lang="php">// Approach 1 $string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐'; $pattern = '/[\p{Han}]+/u'; preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE); var_dump($matches); // Approach 2 $string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐'; $pattern = '/[\x{4e00}-\x{9fa5}]+/u'; preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE); var_dump($matches);</syntaxhighlight>
Summary:
Please note that all contributions to LemonWiki共筆 are considered to be released under the Creative Commons Attribution-NonCommercial-ShareAlike (see
LemonWiki:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Current events
Recent changes
Random page
Help
Categories
Tools
What links here
Related changes
Special pages
Page information