14,672
edits
Line 207: | Line 207: | ||
Find rows where <code>column_name</code> contains Chinese characters: | Find rows where <code>column_name</code> contains Chinese characters: | ||
< | <pre lang="sql">SELECT `column_name` | ||
FROM `table_name` | FROM `table_name` | ||
WHERE HEX(`column_name`) REGEXP '^(..)*(E[4-9])';</ | WHERE HEX(`column_name`) REGEXP '^(..)*(E[4-9])';</pre> | ||
< | |||
Query condition used to match records where the <code>column_name</code> field contains only Chinese characters. | |||
<pre lang="sql">SELECT `column_name` | |||
FROM `table_name` | |||
WHERE `column_name` REGEXP '^[一-龯]+$';</pre> | |||
Explanation: | |||
* {{kbd | key=<nowiki>[一-龯]</nowiki>}} - Character set that matches all characters from "一" to "龯" in Unicode | |||
* "一" has Unicode code point {{kbd | key=<nowiki>U+4E00</nowiki>}}<ref>[https://www.compart.com/en/unicode/U+4E00 “一” U+4E00 CJK Unified Ideograph-4E00 Unicode Character]</ref> | |||
* "龯" has Unicode code point {{kbd | key=<nowiki>U+9FEF</nowiki>}}<ref>[https://www.compart.com/en/unicode/U+9FAF “龯” U+9FAF CJK Unified Ideograph-9FAF Unicode Character]</ref> | |||
* This range U+4E00-U+9FFF already covers over 99% of daily Chinese usage requirements [https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_Extension_B Extension B] and later blocks mainly contain ancient Chinese characters, variant characters, etc., which rarely appear in modern texts | |||
==== Find Non-ASCII Characters in MySQL ==== | ==== Find Non-ASCII Characters in MySQL ==== | ||