Regular expression: Difference between revisions

Jump to navigation Jump to search
Line 207: Line 207:
Find rows where <code>column_name</code> contains Chinese characters:
Find rows where <code>column_name</code> contains Chinese characters:


<syntaxhighlight lang="sql">SELECT `column_name`
<pre lang="sql">SELECT `column_name`
FROM `table_name`
FROM `table_name`
WHERE HEX(`column_name`) REGEXP '^(..)*(E[4-9])';</syntaxhighlight>
WHERE HEX(`column_name`) REGEXP '^(..)*(E[4-9])';</pre>
<span id="find-non-ascii-characters-in-mysql"></span>
 
Query condition used to match records where the <code>column_name</code> field contains only Chinese characters.
<pre lang="sql">SELECT `column_name`
FROM `table_name`
WHERE `column_name` REGEXP '^[一-龯]+$';</pre>
 
Explanation:
* {{kbd | key=<nowiki>[一-龯]</nowiki>}} - Character set that matches all characters from "一" to "龯" in Unicode
* "一" has Unicode code point {{kbd | key=<nowiki>U+4E00</nowiki>}}<ref>[https://www.compart.com/en/unicode/U+4E00 “一” U+4E00 CJK Unified Ideograph-4E00 Unicode Character]</ref>
* "龯" has Unicode code point {{kbd | key=<nowiki>U+9FEF</nowiki>}}<ref>[https://www.compart.com/en/unicode/U+9FAF “龯” U+9FAF CJK Unified Ideograph-9FAF Unicode Character]</ref>
* This range U+4E00-U+9FFF already covers over 99% of daily Chinese usage requirements [https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_Extension_B Extension B] and later blocks mainly contain ancient Chinese characters, variant characters, etc., which rarely appear in modern texts
 
==== Find Non-ASCII Characters in MySQL ====
==== Find Non-ASCII Characters in MySQL ====


Navigation menu