Regular expression: Difference between revisions

From LemonWiki共筆
Jump to navigation Jump to search
Tags: Mobile edit Mobile web edit
mNo edit summary
 
(19 intermediate revisions by the same user not shown)
Line 1: Line 1:
透過正規表示法 (Regular Expression) 處理文字檔時,可以快速地搜尋或取代符合特定規則的字串。以每行為單位,進行字串處理<ref>[http://linux.vbird.org/linux_basic/0330regularex.php 鳥哥的 Linux 私房菜 -- 正規表示法 (regular expression, RE) 與文件格式化處理]</ref>。 正規表示法 又稱正規表示式、正規表達式、正則表達式、正規表示法、正規運算式、規則運算式、常規表示法<ref>[https://zh.wikipedia.org/wiki/%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F 正規表示式 - 維基百科,自由的百科全書]</ref>。
When processing text files through regular expressions, you can quickly search for or replace strings that match specific rules. Processing is done on a line-by-line basis for string manipulation. Regular expressions are also known as regex, regexp, or pattern matching expressions.


{{Raise hand | text = 有問題嗎?可以利用提供解說的[[Regular_expression#Regular_expression_online_tools | 線上工具]],嘗試自己除錯。 也可以到[http://www.ptt.cc/bbs/RegExp/index.html 看板 RegExp 文章列表 - 批踢踢實業坊]或其他[[問答服務]]詢問。 }}
{{LanguageSwitcher | content = [[Regular expression | English]], [[Regular expression in Mandarin|漢字]]}}


== 快速查表 ==
{{Raise hand | text = '''Need Help?''' You can use the provided explanatory [[#regular-expression-online-tools|online tools]] to try debugging yourself. }}
說明: (1) sample 藍色網底處代表符合規則的文字、(2) 同一文字規則可以有多種表示法
<table border="1" style="width:100%" class="wikitable">
<tr >
<th style="background-color: #E0E0E0;"> 文字規則 </th>
<th style="background-color: #E0E0E0; width:260px;"> sample </th>
<th style="background-color: #9c9ca3;"> 對立的文字規則 </th>
<th style="background-color: #9c9ca3; width:260px;"> sample</th>
</tr>
<tr>
<td> 任意一個文字(包含空白,但不包含換行符號) <br /> {{kbd | key = <nowiki>.</nowiki>}} </td>
<td><span style="background:#C6E3FF">W</span>hat Does the Fox Say? 12 狐狸怎叫 34</td>
<td></td>
<td></td>
</tr>
<tr>
<td> 任意文字(包含空白),出現1次或0次 <br /> {{kbd | key = <nowiki>.?</nowiki>}} = {{kbd | key = <nowiki>.{0,1}</nowiki>}}</td>
<td><span style="background:#C6E3FF">W</span>hat Does the Fox Say? 12 狐狸怎叫 34</td>
<td></td>
<td></td>
</tr>
<tr>
<td> 任意次的多個文字(包含空白) <br /> {{kbd | key = <nowiki>.*</nowiki>}} ={{kbd | key = <nowiki> .{0,}</nowiki>}}</td>
<td><span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span></td>
<td></td>
<td></td>
</tr>
<tr>
<td> 任意次的文字(包含空白),至少出現1次 <br /> {{kbd | key = <nowiki>.+</nowiki>}} = {{kbd | key = <nowiki>.{1,}</nowiki>}}</td>
<td><span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span></td>
<td></td>
<td></td>
</tr>
<tr>
<td> 任意次的空白或換行符號 (至少出現1次的空白或換行符號)  <br /> {{kbd | key = <nowiki>\s+</nowiki>}} </td>
<td>What<span style="background:#C6E3FF"> </span>Does the Fox Say? 12 狐狸怎叫 34</td>
<td>任意多個文字(不包含空白或換行符號) <br /> {{kbd | key = <nowiki>[^\s]+</nowiki>}} ={{kbd | key = <nowiki> [^\s]{1,}</nowiki>}} = {{kbd | key = <nowiki> [\S]+</nowiki>}} = {{kbd | key = <nowiki>[^ ]+</nowiki>}}</td>
<td><span style="background:#C6E3FF">What</span> Does the Fox Say? 12 狐狸怎叫 34</td>
</tr>
<tr>
<td> 任意次的 ASCII character (包含英文、數字和空白) [http://regexr.com/3aom2 demo]<ref>[http://www.asciitable.com/ Ascii Table - ASCII character codes and html, octal, hex and decimal chart conversion]</ref> <br /> {{kbd | key = <nowiki>[\x00-\x80]+</nowiki>}} 或 {{kbd | key = <nowiki>[[:ascii:]]+</nowiki>}}<ref>[https://stackoverflow.com/questions/24903140/regex-for-any-english-ascii-character-including-special-characters php - Regex for Any English ASCII Character Including Special Characters - Stack Overflow]</ref></td>
<td><span style="background:#C6E3FF">What Does the Fox Say? 12</span> 狐狸怎叫 34</td>
<td>非 ASCII,即中文出現任意次<br /> {{kbd | key = <nowiki>[^\x00-\x80]+</nowiki>}}</td>
<td>What Does the Fox Say? 12 <span style="background:#C6E3FF">狐狸怎叫</span> 34</td>
</tr>
<tr>
<td> 任意次的大小寫英文、數字和底線符號( _ ) (不包含空白) ([https://regex101.com/r/gIKB6a/1 demo])<br /> {{kbd | key = <nowiki>[\w]+</nowiki>}} = {{kbd | key = <nowiki>[a-zA-Z0-9_]+</nowiki>}} <br /> PHP 加上 {{kbd | key =u}} 修飾語,則可支援中文字 </td>
<td><span style="background:#C6E3FF">What</span> <span style="background:#C6E3FF">Does</span> <span style="background:#C6E3FF">the</span> <span style="background:#C6E3FF">Fox</span> <span style="background:#C6E3FF">Say</span>? <span style="background:#C6E3FF">12</span> 狐狸怎叫 <span style="background:#C6E3FF">_34</span></td>
<td> 任意次的不是英文、數字和底線符號( _ )的文字 <br /> {{kbd | key = <nowiki>\W+</nowiki>}} = {{kbd | key = <nowiki>[^a-zA-Z0-9_]+</nowiki>}}</td>
<td>[http://regexr.com/3bk4v demo]</td>
</tr>
<tr>
<td> 任意次的數字(不包含空白) <br /> {{kbd | key = <nowiki>[\d]+</nowiki>}} = {{kbd | key = <nowiki>[0-9]+</nowiki>}}</td>
<td>What Does the Fox Say? <span style="background:#C6E3FF">12</span> 狐狸怎叫 34</td>
<td>不包含數字的任意次文字(包含空白  <br /> {{kbd | key = <nowiki>[^\d]+</nowiki>}} = {{kbd | key = <nowiki>[^0-9]+</nowiki>}} = {{kbd | key = <nowiki>\D+</nowiki>}} </td>
<td><span style="background:#C6E3FF">What Does the Fox Say? </span>12 狐狸怎叫 34</td>
</tr>
<tr>
<td> 任意次的中文字 <br /> {{kbd | key = <nowiki>[\p{Han}]+</nowiki>}} ([https://regex101.com/r/UYkdml/1 demo]、[[Regular expression#尋找中文、非英文的文字 | 詳細說明]])</td>
<td>What Does the Fox Say? 12 <span style="background:#C6E3FF">狐狸怎叫</span> 34</td>
<td>不包含中文字的任意次文字  <br /> {{kbd | key = <nowiki>[^\p{Han}]+</nowiki>}} ([https://regex101.com/r/Nk9GdA/1 demo])</td>
<td></td>
</tr>
<tr>
<td> 以「狐狸」開頭的行 <br /> {{kbd | key = <nowiki>^狐狸.*$</nowiki>}}<ref>[http://www.regular-expressions.info/completelines.html Regex Examples: Matching Whole Lines of Text That Satisfy Certain Requirements]</ref></td>
<td>
<span style="background:#C6E3FF">狐狸怎叫 34 What Does the Fox Say?</span><br />
柴犬怎叫 What Does the shiba inu say?
</td>
<td>不以「狐狸」開頭的行  <br /> {{kbd | key = <nowiki>^(?!狐狸).*$</nowiki>}}<ref>[http://stackoverflow.com/questions/406230/regular-expression-to-match-text-that-doesnt-contain-a-word regex - Regular expression to match text that *doesn't* contain a word? - Stack Overflow]</ref> </td>
<td>
狐狸怎叫 34 What Does the Fox Say?<br />
<span style="background:#C6E3FF">柴犬怎叫 What Does the shiba inu say?</span>
</td>
</tr>
<tr>
<td> 以「怎叫」結尾的行 <br /> {{kbd | key = <nowiki>^.*怎叫$</nowiki>}}
<td>
What Does the Fox Say? 12 狐狸怎叫 34<br />
<span style="background:#C6E3FF">What Does the shiba inu say? 柴犬怎叫</span>
</td>
<td>不以「怎叫」結尾的行  <br /> {{kbd | key = <nowiki>.*(?<!怎叫)$</nowiki>}}<ref>[http://stackoverflow.com/questions/16398471/regex-not-ending-with Regex not ending with - Stack Overflow]</ref></td>
<td>
<span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br />
What Does the shiba inu say? 柴犬怎叫
</td>
</tr>
<tr>
<td> 包含「狐狸」的行 <br /> {{kbd | key = <nowiki>^.*狐狸.*$</nowiki>}} 或 {{kbd | key = <nowiki>(狐狸)</nowiki>}} ([https://regex101.com/r/UEtYst/1 demo])</td>
<td>
<span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br />
What Does the shiba inu say? 柴犬怎叫
</td>
<td>不包含「狐狸」的行 ([https://regex101.com/r/rvncjU/1 demo]) <br /> {{kbd | key = <nowiki>^((?!狐狸).)*$</nowiki>}} </td>
<td>
What Does the Fox Say? 12 狐狸怎叫 34<br />
<span style="background:#C6E3FF">What Does the shiba inu say? 柴犬怎叫 </span>
</td>
</tr>
<tr>
<td> 布林邏輯 AND: 包含「狐狸」和「叫」的行 ([http://regexr.com/3aokl demo])<ref>[http://stackoverflow.com/questions/469913/regular-expressions-is-there-an-and-operator regex - Regular Expressions: Is there an AND operator? - Stack Overflow]</ref><br /> {{kbd | key = <nowiki>(?=.*狐狸)(?=.*叫).*</nowiki>}} 或 {{kbd | key = <nowiki>狐狸.*叫|叫.*狐狸</nowiki>}}</td>
<td>
<span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br />
<span style="background:#C6E3FF">What Does the Fox Say? 12 不叫狐狸 34</span><br />
What Does the shiba inu say? 柴犬怎叫
</td>
<td></td>
<td></td>
</tr>
<tr>
<td> 布林邏輯 OR: 包含「狐狸」或「叫」的行 ([https://regexr.com/6cu06 demo])<br /> {{kbd | key = <nowiki>.*(狐狸|叫).*</nowiki>}}</td>
<td>
<span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34<br />
What Does the shiba inu say? 柴犬怎叫</span><br />
What Does the shiba inu say? 柴犬怎了
</td>
<td>布林邏輯: 不包含「狐狸」也不包含「柴犬」的行<br /> {{kbd | key = <nowiki>^((?!狐狸|柴犬).)*$</nowiki>}}</td>
<td>What Does the Fox Say? 12 狐狸怎叫 34<br />
What Does the shiba inu say? 柴犬怎叫<br />
<span style="background:#C6E3FF">What Does the Husky say? 哈士奇怎叫 </span></td>
</tr>
<tr>
<td> 布林邏輯 NOT: 不包含「狐狸」,但包含「柴犬」的行 ([http://regexr.com/3aokr demo])<ref>[http://stackoverflow.com/questions/2953039/regular-expression-for-a-string-containing-one-word-but-not-another regex - Regular expression for a string containing one word but not another - Stack Overflow]</ref><br /> {{kbd | key = <nowiki>^((?!狐狸).)*(柴犬).*$</nowiki>}} = {{kbd | key = <nowiki>^(柴犬).*((?!狐狸).)*$</nowiki>}} = {{kbd | key = <nowiki>(柴犬).*((?!狐狸).)*</nowiki>}} (如果句子同時存在狐狸和柴犬會出錯) </td>
<td>
What Does the Fox Say? 12 狐狸怎叫 34<br />
<span style="background:#C6E3FF">What Does the shiba inu say? 柴犬怎叫</span>
</td>
<td></td>
<td></td>
</tr>
</table>


== Regular expression online tools ==
測試 Regular expression 語法的網站
* {{Gd}} [http://regex101.com/ RegEx101] "Online regex tester and debugger: PHP, PCRE, Python, Golang and JavaScript" ([http://regex101.com/r/tH1eT7/1 example]) 有提供語法解說。教學: [https://www.minwt.com/webdesign-dev/html/20352.html RegEx101正規表示法線上產生器,有沒有選到立馬告訴你|梅問題.教學網]
* {{Gd}} [http://gskinner.com/RegExr/ RegExr]: Learn, Build, & Test RegEx ([http://regexr.com/395t0 example]). 有提供語法解說. 教學: [http://blog.hsdn.net/1426.html RegExr: 功能強大的正規式撰寫協助工具]
* [https://regexper.com/ Regexper]: 圖解方式提供語法解說 e.g. [https://regexper.com/#%5Cd%7B3%7D%28.*%29 \d{3}(.*)]
* [https://jex.im/regulex/ Regulex:JavaScript Regular Expression Visualizer] : 圖解方式提供語法解說 e.g. [https://jex.im/regulex/#!flags=&re=%5E(a%7Cb)*%3F%24 ^(a|b)*?$]
* [http://www.rubular.com/ Rubular]: a Ruby regular expression editor and tester ([http://www.rubular.com/r/UZuUT5pjeh example])
* [http://www.phpliveregex.com/ PHP Live Regex] {{access | date=2014-11-25}}
* [http://www.regextester.com/ Regex Tester and Debugger Online - Javascript, PCRE, PHP] {{access | date=2016-01-07}}
* [http://rocksaying.tw/archives/2670695.html Regular Expression (RegExp) in JavaScript - 石頭閒語] {{access | date=2017-11-14}}


Examples
== Quick Reference Table ==
* {{Gd}} [http://regexlib.com/ Regular Expression Library] 網友提供的 pattern 範例


== cases ==
Note: (1) Blue highlighted areas in samples represent text matching the rules, (2) The same text rule can have multiple representations
=== 取代換行符號為逗號 ===
 
將Email清單,轉成Email軟體可以使用的寄信名單
{| class="wikitable"
<pre>
|-
! Text Rule
aaa@email.com
! Sample
bbb@email.com
! Opposite Text Rule
ccc@email.com
! Sample
|-
| Any single character (including spaces, but not newline) <br> <code>.</code>
| <span style="background:#C6E3FF">W</span>hat Does the Fox Say? 12 狐狸怎叫 34
|
|
|-
| Any character (including spaces), appears 1 or 0 times <br> <code>.?</code> = <code>.{0,1}</code>
| <span style="background:#C6E3FF">W</span>hat Does the Fox Say? 12 狐狸怎叫 34
|
|
|-
| Any number of multiple characters (including spaces) <br> <code>.*</code> = <code>.{0,}</code>
| <span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span>
|
|
|-
| Any number of characters (including spaces), at least 1 occurrence <br> <code>.+</code> = <code>.{1,}</code>
| <span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span>
|
|
|-
| Any number of spaces or newlines (at least 1 occurrence) <br> <code>\s+</code>
| What<span style="background:#C6E3FF"> </span>Does the Fox Say? 12 狐狸怎叫 34
| Any number of characters (not including spaces or newlines) <br> <code>[^\s]+</code> = <code>[^\s]{1,}</code> = <code>[\S]+</code> = <code>[^ ]+</code>
| <span style="background:#C6E3FF">What</span> Does the Fox Say? 12 狐狸怎叫 34
|-
| Any number of ASCII characters (including English, numbers and spaces) <br> <code>[\x00-\x80]+</code> or <code>[[:ascii:]]+</code>
| <span style="background:#C6E3FF">What Does the Fox Say? 12</span> 狐狸怎叫 34
| Non-ASCII, i.e., Chinese characters appearing any number of times <br> <code>[^\x00-\x80]+</code>
| What Does the Fox Say? 12 <span style="background:#C6E3FF">狐狸怎叫</span> 34
|-
| Any number of uppercase/lowercase English letters, numbers and underscore (_) (not including spaces) <br> <code>[\w]+</code> = <code>[a-zA-Z0-9_]+</code> <br> PHP with <code>u</code> modifier supports Chinese characters
| <span style="background:#C6E3FF">What</span> <span style="background:#C6E3FF">Does</span> <span style="background:#C6E3FF">the</span> <span style="background:#C6E3FF">Fox</span> <span style="background:#C6E3FF">Say</span>? <span style="background:#C6E3FF">12</span> 狐狸怎叫 <span style="background:#C6E3FF">_34</span>
| Any number of characters that are not English letters, numbers and underscore (_) <br> <code>\W+</code> = <code>[^a-zA-Z0-9_]+</code>
|
|-
| Any number of digits (not including spaces) <br> <code>[\d]+</code> = <code>[0-9]+</code>
| What Does the Fox Say? <span style="background:#C6E3FF">12</span> 狐狸怎叫 34
| Any number of characters not including digits (including spaces) <br> <code>[^\d]+</code> = <code>[^0-9]+</code> = <code>\D+</code>
| <span style="background:#C6E3FF">What Does the Fox Say? </span>12 狐狸怎叫 34
|-
| Any number of Chinese characters <br> <code>[\p{Han}]+</code>
| What Does the Fox Say? 12 <span style="background:#C6E3FF">狐狸怎叫</span> 34
| Any number of characters not including Chinese <br> <code>[^\p{Han}]+</code>
|
|-
| Lines starting with “狐狸” <br> <code>^狐狸.*$</code>
| <span style="background:#C6E3FF">狐狸怎叫 34 What Does the Fox Say?</span><br>柴犬怎叫 What Does the shiba inu say?
| Lines not starting with “狐狸” <br> <code>^(?!狐狸).*$</code>
| 狐狸怎叫 34 What Does the Fox Say?<br><span style="background:#C6E3FF">柴犬怎叫 What Does the shiba inu say?</span>
|-
| Lines ending with “怎叫” <br> <code>^.*怎叫$</code>
| What Does the Fox Say? 12 狐狸怎叫 34<br><span style="background:#C6E3FF">What Does the shiba inu say? 柴犬怎叫</span>
| Lines not ending with “怎叫” <br> <code>.*(?&lt;!怎叫)$</code>
| <span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br>What Does the shiba inu say? 柴犬怎叫
|-
| Lines containing “狐狸” <br> <code>^.*狐狸.*$</code> or <code>(狐狸)</code>
| <span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br>What Does the shiba inu say? 柴犬怎叫
| Lines not containing “狐狸” <br> <code>^((?!狐狸).)*$</code>
| What Does the Fox Say? 12 狐狸怎叫 34<br><span style="background:#C6E3FF">What Does the shiba inu say? 柴犬怎叫</span>
|-
| Boolean logic AND: Lines containing both “狐狸” and “叫” <br> <code>(?=.*狐狸)(?=.*叫).*</code> or <code>狐狸.*叫\|叫.*狐狸</code>
| <span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br><span style="background:#C6E3FF">What Does the Fox Say? 12 不叫狐狸 34</span><br>What Does the shiba inu say? 柴犬怎叫
|
|
|-
| Boolean logic OR: Lines containing “狐狸” or “叫” <br> <code>.*(狐狸\|叫).*</code>
| <span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34<br>What Does the shiba inu say? 柴犬怎叫</span><br>What Does the shiba inu say? 柴犬怎了
| Boolean logic: Lines not containing “狐狸” and not containing “柴犬” <br> <code>^((?!狐狸\|柴犬).)*$</code>
| What Does the Fox Say? 12 狐狸怎叫 34<br>What Does the shiba inu say? 柴犬怎叫<br><span style="background:#C6E3FF">What Does the Husky say? 哈士奇怎叫</span>
|-
| Boolean logic NOT: Lines not containing “狐狸” but containing “柴犬” <br> <code>^((?!狐狸).)*(柴犬).*$</code> = <code>^(柴犬).*((?!狐狸).)*$</code> = <code>(柴犬).*((?!狐狸).)*</code>
| What Does the Fox Say? 12 狐狸怎叫 34<br><span style="background:#C6E3FF">What Does the shiba inu say? 柴犬怎叫</span>
|
|
|}


改成
</pre>


==== 方案1: Sublime Text, EmEditor ====
== Regular Expression Online Tools ==
語法適用 [http://www.sublimetext.com/ Sublime Text], [http://www.emeditor.com/ EmEditor]軟體 (以下為 EmEditor 的操作說明)
# Menu: Search -> Replace
# click "Use Regular Expression"
## Find: {{kbd | key = <nowiki>\n</nowiki>}} ([[Return symbol | 換行符號]] 。{{Win}} 作業系統的換行符號是 {{kbd | key = <nowiki>\r\n</nowiki>}}、{{Mac}} 作業系統的換行符號是 {{kbd | key = <nowiki>\n</nowiki>}},取兩者共有的符號。如果使用 {{Linux}} 作業系統的換行符號是 {{kbd | key = <nowiki>\r</nowiki>}}。 )
## Replace with: {{kbd | key = <nowiki>, </nowiki>}}
# click "Replace all"


Websites for testing regular expression syntax:
* {{Gd}} [http://regex101.com/ RegEx101] - “Online regex tester and debugger: PHP, PCRE, Python, Golang and JavaScript” - Provides syntax explanations
* {{Gd}} [http://gskinner.com/RegExr/ RegExr] - Learn, Build, &amp; Test RegEx - Provides syntax explanations
* [https://regexper.com/ Regexper] - Visual explanation of syntax using diagrams
* [https://jex.im/regulex/ Regulex:JavaScript Regular Expression Visualizer] - JavaScript Regular Expression Visualizer - Visual explanation using diagrams
* [http://www.rubular.com/ Rubular] - A Ruby regular expression editor and tester
* [http://www.phpliveregex.com/ PHP Live Regex]
* [http://www.regextester.com/ Regex Tester and Debugger Online] - JavaScript, PCRE, PHP


<div style="float: left; width: 100%; position: relative; display: block; clear: left;">
<div style="width: 46%;  float: left; margin:0 auto; position: relative; display: block; ">
===== 將每行的文字,移除換行,並且都加上逗號分隔 =====
<pre>
// before
Elmo
Emie
Granny Bird


// after
Elmo, Emie, Granny Bird
</pre>
方法: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor]。
* Find what: {{kbd | key = <nowiki>\n</nowiki>}}
* Replace with: {{kbd | key = <nowiki>, </nowiki>}} 此例是將每行的文字,都加上逗號+空格分隔 (如果要用別的符號分隔,例如頓號分隔,則是 Replace with: {{kbd | key = <nowiki>、</nowiki>}})


</div>
== Common Use Cases ==




<div style="width: 46%; float: left; margin:0 auto; position: absolute; display: block; left: 54%; top: 0;">
=== Replace Newlines with Commas ===


===== 將逗號分隔的文字,還原成逐行顯示,並且移除分隔符號 (,) =====
Converting email lists into a format usable by email software:
<pre>
// before
Elmo, Emie, Granny Bird


// after
<pre>Original:
Elmo
aaa@email.com
Emie
bbb@email.com
Granny Bird
ccc@email.com
</pre>
方法: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor]。{{exclaim}} 輸出結果的每行前面可能會有空白
* Find what: {{kbd | key = <nowiki>([^,]+),</nowiki>}}
* Replace with: {{kbd | key = <nowiki>\1\n</nowiki>}}


</div>
Convert to:
</div>
<div style="clear:both;">&nbsp;</div>


==== 方案2: Notepad++ ====
==== Method 1: Sublime Text, EmEditor ====
使用[http://notepad-plus-plus.org/ Notepad++]軟體
# 選單: 尋找 -> 取代
# 搜尋模式: 勾選「增強模式」 (不是勾選「用類型表式」)
## 尋找目標: {{kbd | key = <nowiki>\n</nowiki>}} (換行符號)
## 取代成: {{kbd | key = <nowiki>, </nowiki>}}
# 勾選全部取代


相關資料: [http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Replacing_Newlines How To Replace Line Ends, thus changing the line layout] last visited: 2010-01-27
# Menu: Search -&gt; Replace
# Check “Use Regular Expression”
#* Find: <code>\n</code> (newline character)
#* Replace with: <code>,</code>
# Click “Replace all”


==== 方案3: Microsoft Word ====
使用Microsoft Word 2002軟體
# 選單: 編輯 -> 取代
# 勾選增強模式
## 尋找目標: {{kbd | key = <nowiki>^p</nowiki>}} (段落標記)
## 取代為: {{kbd | key = <nowiki>, </nowiki>}}
# 勾選全部取代


==== 方案4: Sed command for linux ====
==== Method 2: Notepad++ ====


{{kbd | key=<nowiki>sed 's/要被取代的字串/新的字串/g' old.filename > new.filename</nowiki>}}<ref>[http://linux.vbird.org/linux_basic/0330regularex.php#sed_replace 鳥哥的 Linux 私房菜 -- 正規表示法 (regular expression, RE) 與文件格式化處理]</ref>
# Menu: Find -&gt; Replace
# Search mode: Check “Extended mode” (not “Regular expression”)
#* Find: <code>\n</code>
#* Replace with: <code>,</code>
# Click “Replace All”


(1)要被取代的字串: :a;N;$!ba;s/\n
(2)新的字串: ;


{{kbd | key=<nowiki>sed ':a;N;$!ba;s/\n/; /g' old.filename > new.filename</nowiki>}} <ref>參考 [http://stackoverflow.com/questions/1251999/sed-how-can-i-replace-a-newline-n unix - sed: How can I replace a newline? ]</ref>
==== Method 3: Microsoft Word ====


==== 方案5: 使用支援十六進位編輯 (HEX) 的編輯軟體 ====
# Menu: Edit -&gt; Replace
# Check extended mode
#* Find: <code>^p</code> (paragraph mark)
#* Replace with: <code>,</code>
# Click “Replace All”


使用支援十六進位編輯 (HEX) 的編輯軟體,例如: [https://itunes.apple.com/tw/app/ihex-hex-editor/id909566003?mt=12 ‎iHex - Hex Editor] for {{Mac}}
==== Method 4: Sed command for Linux ====
# 選單 Edit -> Find
# Find: {{kbd | key=<nowiki>0A</nowiki>}} 換行符號
# Replace: {{kbd | key=<nowiki>2c 20</nowiki>}} 其中 2c 是逗號, 20 是空白
# 儲存檔案


相關資料
<syntaxhighlight lang="bash">sed ':a;N;$!ba;s/\n/; /g' old.filename > new.filename</syntaxhighlight>


* [https://www.hexdictionary.com/ Hex Dictionary | Convert Hex / Hexadecimal Numbers to Binary and Decimal]
=== Find IP Addresses (IPv4) ===


=== Find IP address (IPv4) ===
For Notepad++ v.5.9.5: - Find: <code>\d\d?\d?\.\d\d?\d?\.\d\d?\d?\.\d\d?\d?</code>
適用 [http://notepad-plus-plus.org/ Notepad++] 軟體 v.5.9.5
# 選單: 尋找 -> 取代
# 搜尋模式: 勾選「用類型表式」
## 尋找目標: {{kbd | key=<nowiki>\d\d?\d?\.\d\d?\d?\.\d\d?\d?\.\d\d?\d?</nowiki>}}


note: not support {n} syntax
For Sublime Text v. 3.2.21: - Find: <code>(?:\d{1,3}\.){3}\d{1,3}</code>


適用 [https://www.sublimetext.com/ Sublime Text] v. 3.2.21
=== Remove Black Squares (UNIX Line Endings LF) ===
# Find: {{kbd | key=<nowiki>(?:\d{1,3}\.){3}\d{1,3}</nowiki>}}


參考資料:  
Using Notepad++: 1. Menu: Find -&gt; Replace 2. Search mode: Check “Extended mode” - Find: <code>\n\n</code> (2 LF characters) - Replace with: <code>\r\n</code> (CR and LF)
* [https://www.regular-expressions.info/ip.html How to Find or Validate an IP Address] {{access | date = 2019-06-05}}
* [http://sourceforge.net/projects/notepad-plus/forums/forum/331754/topic/4780602 SourceForge.net: Notepad++: Regular expression for IP addresses]
* [http://stackoverflow.com/questions/53497/regular-expression-that-matches-valid-ipv6-addresses regex - Regular expression that matches valid IPv6 addresses - Stack Overflow] {{access | date = 2015-08-10}}


=== 移除記事本純文字檔的黑色方塊(UNIX系統的換行符號 LF ) ===
=== Add Quotes Around Elements ===
使用notepad++軟體
# 選單: 尋找 -> 取代
# 搜尋模式: 勾選「增強模式」
## 尋找目標: \n\n  (註: 2個LF )
## 取代成: \r\n  (註: CR與LF )


用記事本打開純文字檔時,就不會看到黑色方塊
==== Add Quotes Around Array Elements ====


<pre>Before: Elmo, Emie, Granny Bird, Herry Monster, 喀喀獸
After: 'Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸'</pre>
'''Method 1: PHP'''


=== 將每項元素,加上引號框起來 ===
<syntaxhighlight lang="php">$users = array('Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸');
==== 將陣列的每項元素,都加上引號框起來 ====
// Single quotes around each element
<pre>
Elmo, Emie, Granny Bird, Herry Monster, 喀喀獸
修改成
'Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸'
</pre>
方法1: 使用 PHP
{{exclaim}} 如果元素包含換行符號,不能用下面方法處理。
<pre>
$users = array('Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸');
//「單引號」相隔每個元素
$result = implode(",", preg_replace('/^(.*?)$/', "'$1'", $users));
$result = implode(",", preg_replace('/^(.*?)$/', "'$1'", $users));
 
// Double quotes around each element
//「雙引號」相隔每個元素
$result = implode(",", preg_replace('/^(.*?)$/', "\"$1\"", $users));
$result = implode(",", preg_replace('/^(.*?)$/', "\"$1\"", $users));
echo $result;
echo $result;</syntaxhighlight>
</pre>
'''Method 2: Sublime Text or EmEditor''' - Find: <code>([^\s|,]+)</code> - Replace with: <code>'\1'</code> (for single quotes) or <code>&quot;\1&quot;</code> (for double quotes)


Thanks, Joshua! More on [http://melikedev.com/2010/02/24/php-wrap-implode-array-elements-in-quotes/ PHP - Wrap Implode Array Elements in Quotes » Me Like Dev]
'''Method 3: Notepad++''' (Enable “Regular expression” search mode) - Find: <code>([^\s|,]+)</code> - Replace with: <code>'$1'</code> (for single quotes) or <code>&quot;$1&quot;</code> (for double quotes)


方法2: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor]
=== Find Non-ASCII Characters (Chinese/Non-English Text) ===
* Find: {{kbd | key = <nowiki>([^\s|,]+)</nowiki>}}
* 分隔符號
**「單引號」相隔每個元素 Replace with: {{kbd | key = <nowiki>'\1'</nowiki>}}
**「雙引號」相隔每個元素 Replace with: {{kbd | key = <nowiki>"\1"</nowiki>}}


方法3: 使用 [https://notepad-plus-plus.org/ Notepad++]。啟用搜尋模式的「用類型表式」
* Find: {{kbd | key = <nowiki>([^\s|,]+)</nowiki>}}
* 分隔符號
**「單引號」相隔每個元素 Replace with: {{kbd | key = <nowiki>'$1'</nowiki>}}
**「雙引號」相隔每個元素 Replace with: {{kbd | key = <nowiki>"$1"</nowiki>}}


==== In LibreOffice ====


<div style="float: left; width: 100%; position: relative; display: block; clear: left;">
<pre>[^\u0000-\u0080]+</pre>
<div style="width: 46%;  float: left; margin:0 auto; position: relative; display: block; ">


==== 將每行的文字,都加上引號框起來,並且移除換行 ====
<pre>
// before
Elmo
Emie
Granny Bird


// after
==== Find Chinese Characters in Google Sheets ====
'Elmo', 'Emie', 'Granny Bird'
</pre>
方法1: 使用 [http://www.sublimetext.com/ Sublime Text] 、[https://notepad-plus-plus.org/downloads/ Notepad++] 或 [https://zh-tw.emeditor.com/ EmEditor]。該方法有處理每行的前面或後面可能有一格或多格空白


如果使用 {{Mac}} 作業系統
Example: If cell {{kbd | key=A2}} contains any Chinese character, display “Chinese”, otherwise display “English”:
* Find what: {{kbd | key = <nowiki>(\S+)(\s?)+$\n</nowiki>}}
* Replace with: {{kbd | key = <nowiki>'\1', </nowiki>}} <br />(如果要使用雙引號框起來,則是 Replace with: {{kbd | key = <nowiki>"\1", </nowiki>}})


如果使用 {{Win}} 作業系統,需要修改換行符號 {{kbd | key = <nowiki>\n</nowiki>}} 為 {{kbd | key = <nowiki>\r\n</nowiki>}}
<pre>=IF(REGEXMATCH(A2, &quot;[\一-\龥]&quot;), &quot;Chinese&quot;, &quot;English&quot;)</pre>
* Find what: {{kbd | key = <nowiki>(\S+)(\s?)+$\r\n</nowiki>}} on {{Mac}}
* Replace with: {{kbd | key = <nowiki>'\1', </nowiki>}} <br />(如果要使用雙引號框起來,則是 Replace with: {{kbd | key = <nowiki>"\1", </nowiki>}})


==== Find Non-ASCII Characters in Google Sheets ====
Extract non-ASCII characters (such as Chinese, Japanese, emoji, etc.) from cell {{kbd | key=A2}}
<pre>
=IF(ISERROR(REGEXEXTRACT(A2, "[^\x00-\x80]+")), "", REGEXEXTRACT(A2, "[^\x00-\x80]+"))
</pre>


方法2: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor] {{exclaim}} 該方法沒有處理每行的後面可能有一格或多格空白
Explanation of regular expression {{kbd | key=<nowiki>[^\x00-\x80]+</nowiki>}}
* Find what: {{kbd | key = <nowiki>(.*)$\n</nowiki>}} 或 {{kbd | key = <nowiki>(\S+)$\n</nowiki>}} 或 {{kbd | key = <nowiki>(\S+)\n</nowiki>}}
* Replace with: {{kbd | key = <nowiki>'\1', </nowiki>}}


More details on the page [[Add quotation at the start and end of each line | add quotation at the start and end of each line]].
* {{kbd | key=<nowiki>[\x00-\x80]</nowiki>}}: Represents the ASCII character range (character codes 0-128). (1) Standard ASCII range: 0-127 ({{kbd | key=<nowiki>0x00-0x7F</nowiki>}} aka * {{kbd | key=<nowiki>[\x00-\x7F]</nowiki>}})<ref>[https://www.commfront.com/pages/ascii-chart ASCII Chart – CommFront]</ref> (2) Character 128 (({{kbd | key=<nowiki>0x80</nowiki>}}) is actually the first character in the extended ASCII range, not part of the original ASCII standard.<ref>[https://en.wikipedia.org/wiki/UTF-8 UTF-8 - Wikipedia]</ref><ref>[https://en.wikipedia.org/wiki/Control_character Control character - Wikipedia]</ref>
* {{kbd | key=<nowiki>[^...]</nowiki>}}: Means "not" these characters
* {{kbd | key=<nowiki>+</nowiki>}}: Means one or more


</div>
Overall meaning: Matches one or more non-ASCII characters


==== Find Chinese Characters in MySQL ====


<div style="width: 46%; float: left; margin:0 auto; position: absolute; display: block; left: 54%; top: 0;">
Find rows where <code>column_name</code> contains Chinese characters:


==== 將引號框起來的文字,還原成逐行顯示,並且移除分隔符號 (,) ====
<pre lang="sql">SELECT `column_name`
<pre>
FROM `table_name`
// before
WHERE HEX(`column_name`) REGEXP '^(..)*(E[4-9])';</pre>
'Elmo', 'Emie', 'Granny Bird'


// after
Query condition used to match records where the <code>column_name</code> field contains only Chinese characters.
Elmo
<pre lang="sql">SELECT `column_name`
Emie
FROM `table_name`
Granny Bird
WHERE `column_name` REGEXP '^[一-龯]+$';</pre>
</pre>
方法: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor]。該方法有處理每行的前面或後面可能有一格或多格空白
* Find what: {{kbd | key = <nowiki>'(([^,|^'])+)',?\s?</nowiki>}}
* Replace with: {{kbd | key = <nowiki>\1\n</nowiki>}}


</div>
Explanation:
</div>
* {{kbd | key=<nowiki>[一-龯]</nowiki>}} - Character set that matches all characters from "一" to "龯" in Unicode
* "一" has Unicode code point {{kbd | key=<nowiki>U+4E00</nowiki>}}<ref>[https://www.compart.com/en/unicode/U+4E00 “一” U+4E00 CJK Unified Ideograph-4E00 Unicode Character]</ref>
* "龯" has Unicode code point {{kbd | key=<nowiki>U+9FEF</nowiki>}}<ref>[https://www.compart.com/en/unicode/U+9FAF “龯” U+9FAF CJK Unified Ideograph-9FAF Unicode Character]</ref>
* This range U+4E00-U+9FFF already covers over 99% of daily Chinese usage requirements [https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_Extension_B Extension B] and later blocks mainly contain ancient Chinese characters, variant characters, etc., which rarely appear in modern texts


<div style="clear:both;">&nbsp;</div>
==== Find Non-ASCII Characters in MySQL ====


Find rows where <code>column_name</code> is not entirely ASCII characters:


=== 將試算表欄位值前後,加上雙引號框起來 ===
<syntaxhighlight lang="sql">SELECT `column_name`
* [https://errerrors.blogspot.com/2019/03/how-to-enclose-non-empty-cell-with-double-quotes-in-google-spreadsheet.html Google 試算表的文字類型欄位值的前後加上雙引號]
 
=== Find non-ASCII characters 尋找中文、非英文的文字 ===
==== Find non-ASCII characters in Google sheet ====
適用: Google Drive 試算表的 Regular expression 相關函數,例如: [https://support.google.com/docs/answer/3098292?hl=zh-Hant REGEXMATCH]、[https://support.google.com/docs/answer/3098244?hl=en REGEXEXTRACT]、[https://support.google.com/docs/answer/3098245?hl=zh-Hant RegExReplace] 函數、Notepad++的搜尋
<pre>
[^\x00-\x80]+
</pre>
 
==== Find non-ASCII characters in LibreOffice ====
適用: [https://zh-tw.libreoffice.org/ LibreOffice] [https://help.libreoffice.org/6.2/en-US/text/scalc/01/func_regex.html REGEX] function<ref>[https://help.libreoffice.org/6.2/en-US/text/shared/01/02100001.html?&DbPAR=WRITER&System=MAC List of Regular Expressions]</ref>、Total commander 的 Multi-Rename tool<ref>取代非英文的文字,但是不包含 . 符號: <nowiki>[^\u0000-\u0080|.]+ </nowiki></ref><ref>[http://stackoverflow.com/questions/150033/regular-expression-to-match-non-english-characters javascript - Regular expression to match non-english characters? - Stack Overflow]</ref>
<pre>
[^\u0000-\u0080]+
</pre>
 
==== Find Chinese characters in Google sheet ====
範例:如果 A2 包含任一中文字,則欄位值顯示「中文」。如果未包含任何中文字,則欄位值顯示「英文」:
<pre>
=IF(REGEXMATCH(A2, "[\一-\龥]"), "中文", "英文")
</pre>
 
{{exclaim}} Google 不支援以下語法,會顯示「... 是無效的規則運算式。」錯誤
* {{kbd | key=<nowiki>[\u4e00-\u9fa5]</nowiki>}}
* {{kbd | key=<nowiki>[^\u4e00-\u9fa5]</nowiki>}}
* {{kbd | key=<nowiki>[\p{Script=Hans}]</nowiki>}}
* {{kbd | key=<nowiki>[\p{Han}]</nowiki>}}
 
==== Find Chinese characters in MySQL ====
尋找 `column_name` 欄位值包含中文字。適用: MySQL<ref>[https://stackoverflow.com/questions/9795137/how-to-detect-rows-with-chinese-characters-in-mysql How to detect rows with chinese characters in MySQL? - Stack Overflow]</ref><ref>[https://stackoverflow.com/questions/401771/how-can-i-find-non-ascii-characters-in-mysql How can I find non-ASCII characters in MySQL? - Stack Overflow]</ref>
<pre>
SELECT `column_name`
FROM `table_name`
FROM `table_name`
WHERE HEX(`column_name`) REGEXP '^(..)*(E[4-9])';
WHERE `column_name` <> CONVERT(`column_name` USING ASCII)</syntaxhighlight>
</pre>


<pre>
==== Find Chinese Characters in PHP ====
SELECT `column_name`
FROM `table_name`
WHERE HEX(`column_name`) REGEXP '(E[4-9])';
</pre>


==== Find non-ASCII characters in MySQL ====
'''Exact match:'''
尋找 `column_name` 欄位值不完全是 ASCII 字元
<pre>
SELECT `column_name`
FROM `table_name`
WHERE `column_name` <> CONVERT(`column_name` USING ASCII)
</pre>


==== Find non-ASCII characters in PHP ====
<syntaxhighlight lang="php">// Approach 1
尋找欄位值包含中文字,中文字包含繁體中文與簡體中文,不包含標點符號 (例如 {{kbd | key = <nowiki>,</nowiki>}})、全形標點符號 (例如 {{kbd | key = <nowiki>,</nowiki>}})以及特殊符號,例如 Emoji:{{kbd | key = ⭐}}。
PHP: exact match
<pre>
// approach 1
if (preg_match('/^[\x{4e00}-\x{9fa5}]+$/u', $string)) {
if (preg_match('/^[\x{4e00}-\x{9fa5}]+$/u', $string)) {
echo "全部文字都是中文字" . PHP_EOL;
    echo "All text is Chinese characters" . PHP_EOL;
}else{
} else {
echo "部分文字不是中文字" . PHP_EOL;
    echo "Some text is not Chinese characters" . PHP_EOL;
}
}


// approach 2
// Approach 2
if (preg_match('/^[\p{Han}]+$/u', $string)) {
if (preg_match('/^[\p{Han}]+$/u', $string)) {
echo "全部文字都是中文字" . PHP_EOL;
    echo "All text is Chinese characters" . PHP_EOL;
}else{
} else {
echo "部分文字不是中文字" . PHP_EOL;
    echo "Some text is not Chinese characters" . PHP_EOL;
}
}</syntaxhighlight>
</pre>
'''Partial match:'''


partial match ([http://sandbox.onlinephpfunctions.com/code/d780845d20877c0fd2e693b28ed02a10d250d39e online demo] hosted by [http://sandbox.onlinephpfunctions.com/ PHP Sandbox])
<syntaxhighlight lang="php">// Approach 1
<pre>
// approach 1
$string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐';
$string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐';
$pattern = '/[\p{Han}]+/u';
$pattern = '/[\p{Han}]+/u';
preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE);
preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE);
var_dump($matches);
var_dump($matches);


// approach 2
// Approach 2
$string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐';
$string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐';
$pattern = '/[\x{4e00}-\x{9fa5}]+/u';
$pattern = '/[\x{4e00}-\x{9fa5}]+/u';
preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE);
preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE);
var_dump($matches);</syntaxhighlight>


var_dump($matches);
=== Find ASCII Characters in PHP ===
</pre>
 
技術問題除錯: 錯誤訊息
<pre>preg_match(): Compilation failed: character value in \x{} or \o{} is too large at offset 8</pre>


解決方式: [http://php.net/manual/en/function.preg-match.php preg_match()] 需要加上 {{kbd | key = u }} 變數<ref>[https://stackoverflow.com/questions/32375531/preg-match-compilation-failed-character-value-in-x-or-o-is-too-large-a php - preg_match(): Compilation failed: character value in \x{} or \o{} is too large at offset 27 on line number 25 - Stack Overflow]</ref>。
'''Code I:'''


==== Find non-ASCII characters in JavaScript ====
<syntaxhighlight lang="php">if (preg_match('/[^\x20-\x7f]/', $keyword) === 0) {
* [https://stackoverflow.com/questions/21109011/javascript-unicode-string-chinese-character-but-no-punctuation regex - Javascript unicode string, chinese character but no punctuation - Stack Overflow]
    echo "The keyword is ASCII only";
} else {
    echo "The keyword contains non-ASCII characters (like Chinese, Japanese, etc.)";
}</syntaxhighlight>
'''Code II:'''


參考資料:  
<syntaxhighlight lang="php">$pattern = '/^[[:ascii:]]+$/i';
* [http://blog.csdn.net/tinyletero/article/details/8201465 unicode编码 \u4e00-\u9fa5 匹配所有中文 - CSDN博客]
$text = "Hello World"; // ASCII only
* [https://stackoverflow.com/questions/38168419/codeigniter-form-validation-for-chinese-words php - CodeIgniter Form Validation for Chinese Words - Stack Overflow]
if (preg_match($pattern, $text)) {
* [https://zh.wikipedia.org/zh-tw/%E4%B8%AD%E6%97%A5%E9%9F%93%E7%B5%B1%E4%B8%80%E8%A1%A8%E6%84%8F%E6%96%87%E5%AD%97%E5%88%97%E8%A1%A8 中日韓統一表意文字列表 - 維基百科,自由的百科全書]
    echo "Pure ASCII characters";
} else {
    echo "Contains non-ASCII characters";
}</syntaxhighlight>


=== 尋找英文字 ===
=== Remove Empty Lines ===


'''Original:'''


==== 尋找 ASCII 字元 in MySQL ====
<pre>Neo
<pre>
Trinity
-- 尋找欄位 `my_column` 欄位值是 ASCII 字元


SELECT *
Morpheus
FROM `my_table`
WHERE `my_column` LIKE CONVERT(`my_column` USING ASCII)
</pre>


相關文章
* [https://errerrors.blogspot.com/2020/07/search-app-not-apple-in-englishsearching.html 解決英文字的搜尋:搜尋 app 而不是 apple]


參考資料
Smith
* [https://stackoverflow.com/questions/401771/how-can-i-find-non-ascii-characters-in-mysql How can I find non-ASCII characters in MySQL? - Stack Overflow]
Oracle</pre>
'''After:'''


==== 尋找英文字、數字、破折號(-)或底線(_)字元 in MySQL ====
<pre>Neo
<pre>
Trinity
-- 尋找欄位 `my_column` 欄位值是包含英文字、數字、破折號(-)或底線(_)的字串
Morpheus
 
Smith
SELECT *
Oracle</pre>
FROM `my_table`
'''Using Sublime Text &amp; EmEditor:''' - Find: <code>^[\s\t]*$\n</code> - Replace with: (empty)
WHERE `my_column` REGEXP '[a-zA-Z0-9\-_]'
</pre>
 
=== 將每行文字的行頭加上逗號符號 ===
使用notepad++軟體
# 選單: 尋找 -> 取代
# 搜尋模式: 勾選「用類型表示」
## 尋找目標: {{kbd | key=(.*)}} 或者是 {{kbd | key=^(.*)$}}
## 取代成: {{kbd | key=,\1}} 或者是 {{kbd | key=,$1}}。
 
參考資料: [http://stackoverflow.com/questions/8413237/notepad-regex-search-replace-how-to-append-and-prepend-a-character-at-start-a Notepad++ RegEx Search/Replace: How to append and prepend a character at start and end of each file line? - Stack Overflow]
 
=== 知道前面跟後面的文字,但是中間文字忘記了 ===
使用notepad++軟體
# 選單: 尋找 -> 取代
# 搜尋模式: 勾選「用類型表示」
## 尋找目標: {{kbd | key=a(.*)le}} 就可以找到(1)apple (2)apps lesson ... 等a開頭、le結尾的文字,中間可夾雜空白。 {{exclaim}} 中文字串搜尋,建議將文件的編碼改成 UTF-8 編碼
 
 
=== 移除空白行 ===
<pre>
# (原) 每行可能間隔一行空白或多行空白
尼歐
崔妮蒂
 
莫斐斯
 
 
史密斯
祭師
 
# (後) 改成每行逐行緊接著
尼歐
崔妮蒂
莫斐斯
史密斯
祭師
</pre>


移除一行空白或多行空白( 行內可能包含一個或多個空白字元 {{kbd | key= SPACE}} 、定位鍵{{kbd | key= TAB}})
'''Using Notepad++ v7.8.7:''' - Menu: Edit -&gt; Line Operations -&gt; Remove Empty Lines (Including Blank Lines)
* 使用工具:  適用 Sublime Text 與 EmEditor 軟體,需勾選「使用規則運算式」。{{exclaim}} 以下語法不適用於 Notepad++ 軟體<ref>[http://www.sitepoint.com/forums/showthread.php?448843-Regex-delete-multiple-blank-lines Regex: delete multiple blank lines]</ref>
** 尋找: {{kbd | key=<nowiki>^[\s\t]*$\n</nowiki>}} --> 取代為: 空 (不需要輸入任何字)
* 使用工具: Notepad++ v7.8.7
** Notepad++ 軟體選單: 編輯 -> 行處理 -> 移除空行(包括只有空白字元的行)<ref>[http://stackoverflow.com/questions/3866034/removing-empty-lines-in-notepad regex - Removing empty lines in Notepad++ - Stack Overflow]</ref>
* 詳細說明,請見 [[Regular replace blank lines]]


=== 尋找非空白的文字 ===
=== Find Non-Whitespace Text ===
* 尋找: {{kbd | key=<nowiki>[^\s]+</nowiki>}} [https://regex101.com/r/zH7wV3/1 online demo]
* [https://errerrors.blogspot.com/2022/01/avoid-whitespace-character-caused-program-stop-abnormally.html 解決遇到空白段落發生程式異常錯誤而執行中斷的問題] 「... 看起來空白的字元,卻無法使用 TRIM 函數去除,可能是其他的空白字元。解決方式是偵測段落內有沒有包含中英文、數字,再進行後續處理。」


=== 去除標點符號、特殊符號等 ===
* Find: <code>[^\s]+</code>


* [https://stackoverflow.com/questions/5689918/php-strip-punctuation/5689989 regex - PHP strip punctuation - Stack Overflow]
=== Convert Symbol-Separated Text to Line-by-Line Display ===


=== 將特定符號相隔的文字,改成逐行顯示 ===
'''Example:'''
例子:  
<pre>
# (原) 頓號(、)符號相隔的文字
尼歐、莫斐斯、崔妮蒂、史密斯、祭師


# (後) 改成逐行顯示
<pre>Before: 尼歐、莫斐斯、崔妮蒂、史密斯、祭師
After:
尼歐
尼歐
莫斐斯
莫斐斯
崔妮蒂
崔妮蒂
史密斯
史密斯
祭師
祭師</pre>
</pre>
'''Using Sublime Text or EmEditor:''' - Find: <code>([^、]+)([、]{1})</code> - Replace with: <code>\1\n</code>


使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor]
=== Replace Multiple Spaces with Tab Characters ===
* Find: {{kbd | key = <nowiki>([^、]+)([、]{1})</nowiki>}}
* Replace with: {{kbd | key = <nowiki>\1\n</nowiki>}}


語法說明
'''Before:''' <code>aaa bbb    ccc</code> '''After:''' <code>aaa\tbbb\tccc</code>
* <nowiki>[^、]</nowiki> : 符合任意字,但不是頓號(、)的文字
* <nowiki>[^、]+</nowiki> : 一次以上不是頓號(、)的文字
* <nowiki>([^、]+)</nowiki> : 符合「一次以上不是頓號(、)的文字」規則的文字
* <nowiki>[、]</nowiki>: 出現頓號(、)任意次的文字
* <nowiki>[、]{1}</nowiki> : 出現頓號(、)一次的文字
* <nowiki>([、]{1})</nowiki> : 符合「出現頓號(、)一次的文字」規則的文字


=== 將每行文字的結尾處,加入空一格 (半形空白) ===
'''Using Sublime Text:''' - Find: <code>([^\S\n]+)</code> or <code>([^\S\r\n]+)</code> or <code>\s\s+</code> - Replace with: <code>\t</code>
法1: 適用軟體: Sublime Text, EmEditor
# Menu: Search -> Replace
# click "Use Regular Expression"
## Find: {{kbd | key = <nowiki>\n</nowiki>}}
## Replace with: {{kbd | key = <nowiki>_\n</nowiki>}}(符號 {{kbd | key = <nowiki>\n</nowiki>}} 前面的 _ 自行替換成半形空白)
# click "Replace all"




法2: 適用軟體: Sublime Text, EmEditor
=== Remove Leading/Trailing Whitespace ===
# Menu: Search -> Replace
# click "Use Regular Expression"
## Find: {{kbd | key = <nowiki>$</nowiki>}}
## Replace with: {{kbd | key = <nowiki>_$</nowiki>}}(符號 {{kbd | key = <nowiki>$</nowiki>}} 前面的 _ 自行替換成半形空白)
# click "Replace all"




{{exclaim}} 需要檢查最後一行是否是空白行,如果不是空白行,不會套用到該取代規則
==== Remove Leading Whitespace ====


=== 將每行文字內夾雜的空白,取代成 Tab 符號 ===
* Find: <code>^\s+</code>
將原本空白間隔的欄位值,取代成 Tab鍵間隔的欄位值。輸出結果可以方便貼到 MS Excel 或 [[Google spreadsheet]]。
* Replace with: (empty)
<pre># \t 代表是 Tab 鍵,又稱定位鍵
# before
aaa bbb    ccc


# after
aaa\tbbb\tccc
</pre>


說明: \S 代表非空白字元, \r\n 代表[[Return symbol | 換行符號]]。[^\S\r\n] 則代表不是非空白字元、也不是換行符號。換句話說尋找空白,但不包含換行符號。
==== Remove Trailing Whitespace ====


使用  Sublime Text 軟體 (參考資料<ref>[http://www.techrepublic.com/blog/microsoft-office/quickly-replace-multiple-space-characters-with-a-tab-character/ Quickly replace multiple space characters with a tab character - TechRepublic]</ref> <ref>[http://stackoverflow.com/questions/3469080/match-whitespace-but-not-newlines-perl regex - Match whitespace but not newlines (Perl) - Stack Overflow]</ref>)
* Find: <code>\s+$</code>
# Menu: Search -> Replace
* Replace with: (empty)
# click "Use Regular Expression"
## Find: {{kbd | key = <nowiki>([^\S\n]+)</nowiki>}} 或 {{kbd | key = <nowiki>([^\S\r\n]+)</nowiki>}} 或 {{kbd | key = <nowiki>\s\s+</nowiki>}} 或 {{kbd | key = <nowiki>_{1,}</nowiki>}} ( 自行替換 _ 成半形空白) {{exclaim}} 因為 {{kbd | key = <nowiki>\s</nowiki>}} 包含了空白與換行字元,所以不能直接使用 {{kbd | key = <nowiki>\s+</nowiki>}} 當做搜尋條件
## Replace with: {{kbd | key = <nowiki>\t</nowiki>}}
# click "Replace all"


=== 移除每行文字前後面可能多個的空白 ===
==== 移除每行文字最前面可能多個的空白 ====
* 尋找: {{kbd | key = <nowiki>^\s+</nowiki>}} --> 取代為: 空白 (適用軟體: Sublime Text、EmEditor,需啟用 "Use Regular Expression" )


<pre># before
==== Remove Both Leading and Trailing Whitespace ====
aaa
bbb
    ccc


# after
* Find: <code>(^\s+|\s+$)</code>
aaa
* Replace with: (empty)
bbb
ccc
</pre>




==== 移除每行文字最後面可能多個的空白 ====
== Text Editors Supporting Regular Expressions ==
* 尋找: {{kbd | key = <nowiki>\s+$</nowiki>}} --> 取代為: 空白 (適用軟體: Sublime Text、EmEditor,需啟用 "Use Regular Expression" )
 
 
==== 移除每行文字前面或後面可能多個的空白 ====
* 尋找: {{kbd | key = <nowiki>(^\s+|\s+$)</nowiki>}} --> 取代為: 空白 (適用軟體: Sublime Text、EmEditor,需啟用 "Use Regular Expression" )
 
=== 尋找包含不是數字,是文字的行 ===
預期每行資料都是數字,尋找包含不是數字,是文字的行
<pre>
[^\d|\n]
</pre>
 
=== 尋找 Hashtag ===
[[Extract all hashtags from text]]
 
=== 尋找文章內容中的網址 ===
[[Extract url from text]]
 
 
=== 尋找數字 ===
請參考 [[Data cleaning#Numeric]]
 
* [[Extract large number from text | 尋找文章內容中的長數字]]
* [https://errerrors.blogspot.com/2020/02/convert-minguo-calendar-to-common-era-using-google-sheet.html Google 試算表將民國轉西元日期]
 
=== 移除刮號內的文字 ===
請參考 [[Remove text within brackets]]
 
=== Search unmatched string ===
find un-commented console.log:
 
original format: some lines contains un-commented [[Javascript debug]] information
<pre>
  console.log("un-commented debug information");
 
  //console.log("commented debug information");
</pre>
 
Search pattern: find not started with the / symbol before the string "console.log"
 
<pre>
  [^/](console\.log)
</pre>


== Text editor with support for regular expression ==
Various text editors support regular expressions including: - Sublime Text - EmEditor - Notepad++ - Visual Studio Code - Atom - Vim/Neovim
[[Text editor with support for regular expression]]


== Regular expression batch tools ==
'''multiple''' regular expression operations on the same file
* {{Gd}} [https://github.com/facelessuser/RegReplace RegReplace] 執行多個取代命令 "Simple find and replace sequencer plugin for Sublime Text" Quoted from official webpage. {{access | date=2014-10-25}}
* ''$'' [https://www.emeditor.com/text-editor-features/more-features/batch-replace/ EmEditor (Text Editor) - Batch Replace] & [https://zh-tw.emeditor.com/text-editor-features/coding/regular-expressions/ EmEditor (文字編輯器) | 規則運算式]


one regular expression operations on '''multiple''' files
== Syntax Reference ==
* ''$''  [https://www.emeditor.com/text-editor-features/more-features/find-replace/ EmEditor (Text Editor) | Find and Replace]


== syntax ==
* Newline character: <code>\r\n</code> (for Notepad++: Extended mode &amp; Regular expression mode)
* 換行符號: \r\n (適用: Notepad++選項: 增強模式 & 用類型表式)
* Tab character: <code>\t</code> (for Notepad++: Extended mode)
* tab鍵的固定空白分隔: \t (適用: Notepad++選項: 增強模式)
* Digits: <code>\d</code> (for Notepad++: Regular expression mode only)
* 數字: \d (適用: Notepad++選項: 用類型表式。{{exclaim}} 不適用: Notepad++選項: 增強模式)
* Non-whitespace: <code>\S</code> - Does not include half-width spaces and full-width spaces
* {{kbd | key=<nowiki>\S</nowiki>}} 非空白的文字: 不會含括半形空白與全行空白


== Troubleshooting of regular expression ==
== Troubleshooting Regular Expressions ==
Tips
* Use online tool [https://regex101.com/ regex101: build, test, and debug regex] to obtain the explain of your syntax
* Small data test: (1) Prepare the small file data to verify the syntax (2) Using the [[Regular_expression#Regular_expression_online_tools | online tools]]
* Highlight or output the matched text e.g. {{kbd | key=<nowiki>--color</nowiki>}}<ref>[https://www.cyberciti.biz/faq/howto-use-grep-command-in-linux-unix/grep_command_examples/ Grep -color command Examples - nixCraft]</ref> for grep command or output the matches by PHP [http://php.net/manual/en/function.preg-match.php preg_match()] function.
* Simplify the syntax
* Because the compatibility issue, you may try to use the alternative syntax e.g. {{kbd | key=<nowiki>\d</nowiki>}} to {{kbd | key=<nowiki>[0-9]+</nowiki>}}.


Related articles
'''Tips:''' 1. Use online tools like regex101 to understand your syntax 2. Test with small data: Prepare small file data to verify syntax 3. Highlight or output matched text for debugging 4. Simplify the syntax when encountering issues 5. Try alternative syntax due to compatibility issues (e.g., <code>\d</code> to <code>[0-9]+</code>)
* [https://errerrors.blogspot.com/2015/07/sublime-text-invalid-lookbehind.html Err: 解決 Sublime Text 正則表示式搜尋,遇到的「Invalid lookbehind assertion」錯誤]


== further reading ==
* [http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Searching_And_Replacing SourceForge.net: Searching And Replacing - notepad-plus], [http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Regular_Expressions SourceForge.net: Regular Expressions - notepad-plus]
* [http://stackoverflow.com/questions/23020856/text-extraction-with-sublime-text regex - text extraction with sublime text - Stack Overflow] {{access | date=2014-09-26}}
* [https://zh.wikipedia.org/wiki/%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F 正規表示式 - 維基百科,自由的百科全書]
* [http://www.regular-expressions.info/ Regular-Expressions.info - Regex Tutorial, Examples and Reference - Regexp Patterns]
* [http://linux.vbird.org/linux_basic/0320bash.php 鳥哥的 Linux 私房菜 -- 第十章、認識與學習BASH] {{access | date = 2016-06-08}}
* [https://stackoverflow.com/questions/3548453/negative-matching-using-grep-match-lines-that-do-not-contain-foo Negative matching using grep (match lines that do not contain foo) - Stack Overflow] {{access | date = 2018-04-06}}
* [https://support.google.com/a/answer/1371415?hl=zh-Hant 規則運算式的語法 - G Suite 管理員說明] {{access | date = 2018-12-06}}
unicode
* [http://www.regular-expressions.info/unicode.html Regex Tutorial - Unicode Characters and Properties] {{access | date = 2014-04-02}}
* [http://php.net/manual/en/regexp.reference.unicode.php PHP: Unicode character properties - Manual] {{access | date = 2014-04-02}}


references
== Alternative Solutions ==
<references/>


== 替代方案 ==
* Use Tab-separated data that can be easily pasted into Google Sheets or MS Excel
* 將資料以 {{kbd |key=Tab}}來隔開,貼到Google Drive的Spreadsheet或MS Excel,會自動儲存到不同欄位。所以將需要處理的原始資料中,需要擷取的資料的前後,使用{{kbd |key=Tab}}來隔開,複製後貼到於Google Drive的Spreadsheet或MS Excel,就會自動儲存到不同欄位,方便做進一步處理。
* Copy multiple rows and paste between different applications (compatibility varies)


Copy multiple rows & paste
== Further Reading ==
* Copy to dreamweaver from MS Excel 2002: ok
* Copy to dreamweaver from Google Docs: not ok {{exclaim}}
* Copy to MS Excel 2002 from Google Docs: ok


* Regular-Expressions.info - Regex Tutorial, Examples and Reference
* Unicode character properties documentation
* Platform-specific regular expression documentation


{{Template:Troubleshooting}}
{{Template: Data factory flow}}


[[Category:Regular expression]] [[Category:Software]] [[Category:Programming]] [[Category:Data Science]] [[Category:Search]] [[Category:String manipulation]]
[[Category: Regular expression]]  
[[Category: Software]]  
[[Category: Programming]]  
[[Category: Data Science]]  
[[Category: Search]]  
[[Category: String manipulation]]
[[Category: Revised with LLMs]

Latest revision as of 11:55, 11 December 2025

When processing text files through regular expressions, you can quickly search for or replace strings that match specific rules. Processing is done on a line-by-line basis for string manipulation. Regular expressions are also known as regex, regexp, or pattern matching expressions.

🌐 Switch language: English, 漢字


Raise_hand.png Need Help? You can use the provided explanatory online tools to try debugging yourself.


Quick Reference Table[edit]

Note: (1) Blue highlighted areas in samples represent text matching the rules, (2) The same text rule can have multiple representations

Text Rule Sample Opposite Text Rule Sample
Any single character (including spaces, but not newline)
.
What Does the Fox Say? 12 狐狸怎叫 34
Any character (including spaces), appears 1 or 0 times
.? = .{0,1}
What Does the Fox Say? 12 狐狸怎叫 34
Any number of multiple characters (including spaces)
.* = .{0,}
What Does the Fox Say? 12 狐狸怎叫 34
Any number of characters (including spaces), at least 1 occurrence
.+ = .{1,}
What Does the Fox Say? 12 狐狸怎叫 34
Any number of spaces or newlines (at least 1 occurrence)
\s+
What Does the Fox Say? 12 狐狸怎叫 34 Any number of characters (not including spaces or newlines)
[^\s]+ = [^\s]{1,} = [\S]+ = [^ ]+
What Does the Fox Say? 12 狐狸怎叫 34
Any number of ASCII characters (including English, numbers and spaces)
[\x00-\x80]+ or ascii:+
What Does the Fox Say? 12 狐狸怎叫 34 Non-ASCII, i.e., Chinese characters appearing any number of times
[^\x00-\x80]+
What Does the Fox Say? 12 狐狸怎叫 34
Any number of uppercase/lowercase English letters, numbers and underscore (_) (not including spaces)
[\w]+ = [a-zA-Z0-9_]+
PHP with u modifier supports Chinese characters
What Does the Fox Say? 12 狐狸怎叫 _34 Any number of characters that are not English letters, numbers and underscore (_)
\W+ = [^a-zA-Z0-9_]+
Any number of digits (not including spaces)
[\d]+ = [0-9]+
What Does the Fox Say? 12 狐狸怎叫 34 Any number of characters not including digits (including spaces)
[^\d]+ = [^0-9]+ = \D+
What Does the Fox Say? 12 狐狸怎叫 34
Any number of Chinese characters
[\p{Han}]+
What Does the Fox Say? 12 狐狸怎叫 34 Any number of characters not including Chinese
[^\p{Han}]+
Lines starting with “狐狸”
^狐狸.*$
狐狸怎叫 34 What Does the Fox Say?
柴犬怎叫 What Does the shiba inu say?
Lines not starting with “狐狸”
^(?!狐狸).*$
狐狸怎叫 34 What Does the Fox Say?
柴犬怎叫 What Does the shiba inu say?
Lines ending with “怎叫”
^.*怎叫$
What Does the Fox Say? 12 狐狸怎叫 34
What Does the shiba inu say? 柴犬怎叫
Lines not ending with “怎叫”
.*(?<!怎叫)$
What Does the Fox Say? 12 狐狸怎叫 34
What Does the shiba inu say? 柴犬怎叫
Lines containing “狐狸”
^.*狐狸.*$ or (狐狸)
What Does the Fox Say? 12 狐狸怎叫 34
What Does the shiba inu say? 柴犬怎叫
Lines not containing “狐狸”
^((?!狐狸).)*$
What Does the Fox Say? 12 狐狸怎叫 34
What Does the shiba inu say? 柴犬怎叫
叫.*狐狸 What Does the Fox Say? 12 狐狸怎叫 34
What Does the Fox Say? 12 不叫狐狸 34
What Does the shiba inu say? 柴犬怎叫
叫).* What Does the Fox Say? 12 狐狸怎叫 34
What Does the shiba inu say? 柴犬怎叫

What Does the shiba inu say? 柴犬怎了
柴犬).)*$ What Does the Fox Say? 12 狐狸怎叫 34
What Does the shiba inu say? 柴犬怎叫
What Does the Husky say? 哈士奇怎叫
Boolean logic NOT: Lines not containing “狐狸” but containing “柴犬”
^((?!狐狸).)*(柴犬).*$ = ^(柴犬).*((?!狐狸).)*$ = (柴犬).*((?!狐狸).)*
What Does the Fox Say? 12 狐狸怎叫 34
What Does the shiba inu say? 柴犬怎叫


Regular Expression Online Tools[edit]

Websites for testing regular expression syntax:


Common Use Cases[edit]

Replace Newlines with Commas[edit]

Converting email lists into a format usable by email software:

Original:
[email protected]
[email protected]
[email protected]

Convert to:
[email protected],[email protected],[email protected]

Method 1: Sublime Text, EmEditor[edit]

  1. Menu: Search -> Replace
  2. Check “Use Regular Expression”
    • Find: \n (newline character)
    • Replace with: ,
  3. Click “Replace all”


Method 2: Notepad++[edit]

  1. Menu: Find -> Replace
  2. Search mode: Check “Extended mode” (not “Regular expression”)
    • Find: \n
    • Replace with: ,
  3. Click “Replace All”


Method 3: Microsoft Word[edit]

  1. Menu: Edit -> Replace
  2. Check extended mode
    • Find: ^p (paragraph mark)
    • Replace with: ,
  3. Click “Replace All”

Method 4: Sed command for Linux[edit]

sed ':a;N;$!ba;s/\n/; /g' old.filename > new.filename

Find IP Addresses (IPv4)[edit]

For Notepad++ v.5.9.5: - Find: \d\d?\d?\.\d\d?\d?\.\d\d?\d?\.\d\d?\d?

For Sublime Text v. 3.2.21: - Find: (?:\d{1,3}\.){3}\d{1,3}

Remove Black Squares (UNIX Line Endings LF)[edit]

Using Notepad++: 1. Menu: Find -> Replace 2. Search mode: Check “Extended mode” - Find: \n\n (2 LF characters) - Replace with: \r\n (CR and LF)

Add Quotes Around Elements[edit]

Add Quotes Around Array Elements[edit]

Before: Elmo, Emie, Granny Bird, Herry Monster, 喀喀獸
After: 'Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸'

Method 1: PHP

$users = array('Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸');
// Single quotes around each element
$result = implode(",", preg_replace('/^(.*?)$/', "'$1'", $users));
// Double quotes around each element
$result = implode(",", preg_replace('/^(.*?)$/', "\"$1\"", $users));
echo $result;

Method 2: Sublime Text or EmEditor - Find: ([^\s|,]+) - Replace with: '\1' (for single quotes) or "\1" (for double quotes)

Method 3: Notepad++ (Enable “Regular expression” search mode) - Find: ([^\s|,]+) - Replace with: '$1' (for single quotes) or "$1" (for double quotes)

Find Non-ASCII Characters (Chinese/Non-English Text)[edit]

In LibreOffice[edit]

[^\u0000-\u0080]+


Find Chinese Characters in Google Sheets[edit]

Example: If cell A2 contains any Chinese character, display “Chinese”, otherwise display “English”:

=IF(REGEXMATCH(A2, "[\一-\龥]"), "Chinese", "English")

Find Non-ASCII Characters in Google Sheets[edit]

Extract non-ASCII characters (such as Chinese, Japanese, emoji, etc.) from cell A2

=IF(ISERROR(REGEXEXTRACT(A2, "[^\x00-\x80]+")), "", REGEXEXTRACT(A2, "[^\x00-\x80]+"))

Explanation of regular expression [^\x00-\x80]+

  • [\x00-\x80]: Represents the ASCII character range (character codes 0-128). (1) Standard ASCII range: 0-127 (0x00-0x7F aka * [\x00-\x7F])[1] (2) Character 128 ((0x80) is actually the first character in the extended ASCII range, not part of the original ASCII standard.[2][3]
  • [^...]: Means "not" these characters
  • +: Means one or more

Overall meaning: Matches one or more non-ASCII characters

Find Chinese Characters in MySQL[edit]

Find rows where column_name contains Chinese characters:

SELECT `column_name`
FROM `table_name`
WHERE HEX(`column_name`) REGEXP '^(..)*(E[4-9])';

Query condition used to match records where the column_name field contains only Chinese characters.

SELECT `column_name`
FROM `table_name`
WHERE `column_name` REGEXP '^[一-龯]+$';

Explanation:

  • [一-龯] - Character set that matches all characters from "一" to "龯" in Unicode
  • "一" has Unicode code point U+4E00[4]
  • "龯" has Unicode code point U+9FEF[5]
  • This range U+4E00-U+9FFF already covers over 99% of daily Chinese usage requirements Extension B and later blocks mainly contain ancient Chinese characters, variant characters, etc., which rarely appear in modern texts

Find Non-ASCII Characters in MySQL[edit]

Find rows where column_name is not entirely ASCII characters:

SELECT `column_name`
FROM `table_name`
WHERE `column_name` <> CONVERT(`column_name` USING ASCII)

Find Chinese Characters in PHP[edit]

Exact match:

// Approach 1
if (preg_match('/^[\x{4e00}-\x{9fa5}]+$/u', $string)) {
    echo "All text is Chinese characters" . PHP_EOL;
} else {
    echo "Some text is not Chinese characters" . PHP_EOL;
}

// Approach 2
if (preg_match('/^[\p{Han}]+$/u', $string)) {
    echo "All text is Chinese characters" . PHP_EOL;
} else {
    echo "Some text is not Chinese characters" . PHP_EOL;
}

Partial match:

// Approach 1
$string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐';
$pattern = '/[\p{Han}]+/u';
preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE);
var_dump($matches);

// Approach 2
$string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐';
$pattern = '/[\x{4e00}-\x{9fa5}]+/u';
preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE);
var_dump($matches);

Find ASCII Characters in PHP[edit]

Code I:

if (preg_match('/[^\x20-\x7f]/', $keyword) === 0) {
    echo "The keyword is ASCII only";
} else {
    echo "The keyword contains non-ASCII characters (like Chinese, Japanese, etc.)";
}

Code II:

$pattern = '/^[[:ascii:]]+$/i';
$text = "Hello World"; // ASCII only
if (preg_match($pattern, $text)) {
    echo "Pure ASCII characters";
} else {
    echo "Contains non-ASCII characters";
}

Remove Empty Lines[edit]

Original:

Neo
Trinity

Morpheus


Smith
Oracle

After:

Neo
Trinity
Morpheus
Smith
Oracle

Using Sublime Text & EmEditor: - Find: ^[\s\t]*$\n - Replace with: (empty)

Using Notepad++ v7.8.7: - Menu: Edit -> Line Operations -> Remove Empty Lines (Including Blank Lines)

Find Non-Whitespace Text[edit]

  • Find: [^\s]+

Convert Symbol-Separated Text to Line-by-Line Display[edit]

Example:

Before: 尼歐、莫斐斯、崔妮蒂、史密斯、祭師
After:
尼歐
莫斐斯
崔妮蒂
史密斯
祭師

Using Sublime Text or EmEditor: - Find: ([^、]+)([、]{1}) - Replace with: \1\n

Replace Multiple Spaces with Tab Characters[edit]

Before: aaa bbb ccc After: aaa\tbbb\tccc

Using Sublime Text: - Find: ([^\S\n]+) or ([^\S\r\n]+) or \s\s+ - Replace with: \t


Remove Leading/Trailing Whitespace[edit]

Remove Leading Whitespace[edit]

  • Find: ^\s+
  • Replace with: (empty)


Remove Trailing Whitespace[edit]

  • Find: \s+$
  • Replace with: (empty)


Remove Both Leading and Trailing Whitespace[edit]

  • Find: (^\s+|\s+$)
  • Replace with: (empty)


Text Editors Supporting Regular Expressions[edit]

Various text editors support regular expressions including: - Sublime Text - EmEditor - Notepad++ - Visual Studio Code - Atom - Vim/Neovim


Syntax Reference[edit]

  • Newline character: \r\n (for Notepad++: Extended mode & Regular expression mode)
  • Tab character: \t (for Notepad++: Extended mode)
  • Digits: \d (for Notepad++: Regular expression mode only)
  • Non-whitespace: \S - Does not include half-width spaces and full-width spaces

Troubleshooting Regular Expressions[edit]

Tips: 1. Use online tools like regex101 to understand your syntax 2. Test with small data: Prepare small file data to verify syntax 3. Highlight or output matched text for debugging 4. Simplify the syntax when encountering issues 5. Try alternative syntax due to compatibility issues (e.g., \d to [0-9]+)


Alternative Solutions[edit]

  • Use Tab-separated data that can be easily pasted into Google Sheets or MS Excel
  • Copy multiple rows and paste between different applications (compatibility varies)

Further Reading[edit]

  • Regular-Expressions.info - Regex Tutorial, Examples and Reference
  • Unicode character properties documentation
  • Platform-specific regular expression documentation

Data factory flow

[[Category: Revised with LLMs]