Regular expression: Difference between revisions

Regular expression (edit)

931 bytes added , 25 September

14,672

edits

@@ Line 207: / Line 207: @@
 Find rows where <code>column_name</code> contains Chinese characters:
-<syntaxhighlight lang="sql">SELECT `column_name`
+<pre lang="sql">SELECT `column_name`
 FROM `table_name`
-WHERE HEX(`column_name`) REGEXP '^(..)*(E[4-9])';</syntaxhighlight>
+WHERE HEX(`column_name`) REGEXP '^(..)*(E[4-9])';</pre>
-<span id="find-non-ascii-characters-in-mysql"></span>
+Query condition used to match records where the <code>column_name</code> field contains only Chinese characters.
+<pre lang="sql">SELECT `column_name`
+FROM `table_name`
+WHERE `column_name` REGEXP '^[一-龯]+$';</pre>
+Explanation:
+* {{kbd | key=<nowiki>[一-龯]</nowiki>}} - Character set that matches all characters from "一" to "龯" in Unicode
+* "一" has Unicode code point {{kbd | key=<nowiki>U+4E00</nowiki>}}<ref>[https://www.compart.com/en/unicode/U+4E00 “一” U+4E00 CJK Unified Ideograph-4E00 Unicode Character]</ref>
+* "龯" has Unicode code point {{kbd | key=<nowiki>U+9FEF</nowiki>}}<ref>[https://www.compart.com/en/unicode/U+9FAF “龯” U+9FAF CJK Unified Ideograph-9FAF Unicode Character]</ref>
+* This range U+4E00-U+9FFF already covers over 99% of daily Chinese usage requirements [https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_Extension_B Extension B] and later blocks mainly contain ancient Chinese characters, variant characters, etc., which rarely appear in modern texts
 ==== Find Non-ASCII Characters in MySQL ====