Regular expression: Difference between revisions

From LemonWiki共筆
Jump to navigation Jump to search
mNo edit summary
 
(106 intermediate revisions by the same user not shown)
Line 1: Line 1:
處理文字檔時,可以快速地搜尋或取代符合特定規則的字串。以每行為單位,進行字串處理<ref>[http://linux.vbird.org/linux_basic/0330regularex.php 鳥哥的 Linux 私房菜 -- 正規表示法 (regular expression, RE) 與文件格式化處理]</ref>。 正規表示法 (Regular Expression),又稱正規表示式、正則表達式、正規表示法、正規運算式、規則運算式、常規表示法<ref>[https://zh.wikipedia.org/wiki/%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F 正規表示式 - 維基百科,自由的百科全書]</ref>。
When processing text files through regular expressions, you can quickly search for or replace strings that match specific rules. Processing is done on a line-by-line basis for string manipulation. Regular expressions are also known as regex, regexp, or pattern matching expressions.


{{Raise hand | text = 有問題嗎?可以利用提供解說的[[Regular_expression#Regular_expression_online_tools | 線上工具]],嘗試自己除錯。 也可以到[http://www.ptt.cc/bbs/RegExp/index.html 看板 RegExp 文章列表 - 批踢踢實業坊]或其他[[問答服務]]詢問。 }}
{{LanguageSwitcher | content = [[Regular expression | English]], [[Regular expression in Mandarin|漢字]]}}


== 快速查表 ==
{{Raise hand | text = '''Need Help?''' You can use the provided explanatory [[#regular-expression-online-tools|online tools]] to try debugging yourself. }}
說明: sample 藍色網底處代表符合規則的文字
<table border="1" style="width:100%">
<tr >
<th style="background-color: #E0E0E0;"> 文字規則 </th>
<th style="background-color: #E0E0E0; width:260px;"> sample </th>
<th style="background-color: #9c9ca3;"> 對立的文字規則 </th>
<th style="background-color: #9c9ca3; width:260px;"> sample</th>
</tr>
<tr>
<td> 任意一個文字(包含空白,但不包含換行符號) <br /> {{kbd | key = <nowiki>.</nowiki>}} </td>
<td><span style="background:#C6E3FF">W</span>hat Does the Fox Say? 12 狐狸怎叫 34</td>
<td></td>
<td></td>
</tr>
<tr>
<td> 任意文字(包含空白),出現1次或0次 <br /> {{kbd | key = <nowiki>.?</nowiki>}} = {{kbd | key = <nowiki>.{0,1}</nowiki>}}</td>
<td><span style="background:#C6E3FF">W</span>hat Does the Fox Say? 12 狐狸怎叫 34</td>
<td></td>
<td></td>
</tr>
<tr>
<td> 任意次的多個文字(包含空白) <br /> {{kbd | key = <nowiki>.*</nowiki>}} ={{kbd | key = <nowiki> .{0,}</nowiki>}}</td>
<td><span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span></td>
<td></td>
<td></td>
</tr>
<tr>
<td> 任意次的文字(包含空白),至少出現1次 <br /> {{kbd | key = <nowiki>.+</nowiki>}} = {{kbd | key = <nowiki>.{1,}</nowiki>}}</td>
<td><span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span></td>
<td></td>
<td></td>
</tr>
<tr>
<td> 任意次的空白或換行符號 (至少出現1次的空白或換行符號)  <br /> {{kbd | key = <nowiki>\s+</nowiki>}} </td>
<td>What<span style="background:#C6E3FF"> </span>Does the Fox Say? 12 狐狸怎叫 34</td>
<td>任意多個文字(不包含空白或換行符號) <br /> {{kbd | key = <nowiki>[^\s]+</nowiki>}} ={{kbd | key = <nowiki> [^\s]{1,}</nowiki>}} = {{kbd | key = <nowiki> [\S]+</nowiki>}} = {{kbd | key = <nowiki>[^ ]+</nowiki>}}</td>
<td><span style="background:#C6E3FF">What</span> Does the Fox Say? 12 狐狸怎叫 34</td>
</tr>
<tr>
<td> 任意次的 ASCII character(包含英文、數字和空白) [http://regexr.com/3aom2 demo]<ref>[http://www.asciitable.com/ understand]</ref> <br /> {{kbd | key = <nowiki>[\x00-\x80]+</nowiki>}}</td>
<td><span style="background:#C6E3FF">What Does the Fox Say? 12</span> 狐狸怎叫 34</td>
<td>非 ASCII,即中文出現任意次<br /> {{kbd | key = <nowiki>[^\x00-\x80]+</nowiki>}}</td>
<td>What Does the Fox Say? 12 <span style="background:#C6E3FF">狐狸怎叫</span> 34</td>
</tr>
<tr>
<td> 任意次的英文、數字和底線符號( _ )文字(不包含空白) <br /> {{kbd | key = <nowiki>[\w]+</nowiki>}} = {{kbd | key = <nowiki>[a-zA-Z0-9_]+</nowiki>}} </td>
<td><span style="background:#C6E3FF">What</span> Does the Fox Say? 12 狐狸怎叫 34</td>
<td> 任意次的不是英文、數字和底線符號( _ )的文字 <br /> {{kbd | key = <nowiki>\W+</nowiki>}} = {{kbd | key = <nowiki>[^a-zA-Z0-9_]+</nowiki>}}</td>
<td>[http://regexr.com/3bk4v demo]</td>
</tr>
<tr>
<td> 任意次的數字(不包含空白) <br /> {{kbd | key = <nowiki>[\d]+</nowiki>}} = {{kbd | key = <nowiki>[0-9]+</nowiki>}}</td>
<td>What Does the Fox Say? <span style="background:#C6E3FF">12</span> 狐狸怎叫 34</td>
<td>不包含數字的任意次文字(包含空白  <br /> {{kbd | key = <nowiki>[^\d]+</nowiki>}} = {{kbd | key = <nowiki>[^0-9]+</nowiki>}} = {{kbd | key = <nowiki>\D+</nowiki>}} </td>
<td><span style="background:#C6E3FF">What Does the Fox Say? </span>12 狐狸怎叫 34</td>
</tr>
<tr>
<td> 以「狐狸」開頭的行 <br /> {{kbd | key = <nowiki>^狐狸.*$</nowiki>}}<ref>[http://www.regular-expressions.info/completelines.html Regex Examples: Matching Whole Lines of Text That Satisfy Certain Requirements]</ref></td>
<td>
<span style="background:#C6E3FF">狐狸怎叫 34 What Does the Fox Say?</span><br />
柴犬怎叫 What Does the shiba inu say?
</td>
<td>不以「狐狸」開頭的行  <br /> {{kbd | key = <nowiki>^(?!狐狸).*$</nowiki>}}<ref>[http://stackoverflow.com/questions/406230/regular-expression-to-match-text-that-doesnt-contain-a-word regex - Regular expression to match text that *doesn't* contain a word? - Stack Overflow]</ref> </td>
<td>
狐狸怎叫 34 What Does the Fox Say?<br />
<span style="background:#C6E3FF">柴犬怎叫 What Does the shiba inu say?</span>
</td>
</tr>
<tr>
<td> 以「怎叫」結尾的行 <br /> {{kbd | key = <nowiki>^.*怎叫$</nowiki>}}
<td>
What Does the Fox Say? 12 狐狸怎叫 34<br />
<span style="background:#C6E3FF">What Does the shiba inu say? 柴犬怎叫</span>
</td>
<td>不以「怎叫」結尾的行  <br /> {{kbd | key = <nowiki>.*(?<!怎叫)$</nowiki>}}<ref>[http://stackoverflow.com/questions/16398471/regex-not-ending-with Regex not ending with - Stack Overflow]</ref></td>
<td>
<span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br />
What Does the shiba inu say? 柴犬怎叫
</td>
</tr>
<tr>
<td> 包含「狐狸」的行 <br /> {{kbd | key = <nowiki>^.*狐狸.*$</nowiki>}}</td>
<td>
<span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br />
What Does the shiba inu say? 柴犬怎叫
</td>
<td>不包含「狐狸」的行  <br /> {{kbd | key = <nowiki>^((?!狐狸).)*$</nowiki>}} </td>
<td>
What Does the Fox Say? 12 狐狸怎叫 34<br />
<span style="background:#C6E3FF">What Does the shiba inu say? 柴犬怎叫 </span>
</td>
</tr>
<tr>
<td> 布林邏輯 AND: 包含「狐狸」和「叫」的行 ([http://regexr.com/3aokl demo])<ref>[http://stackoverflow.com/questions/469913/regular-expressions-is-there-an-and-operator regex - Regular Expressions: Is there an AND operator? - Stack Overflow]</ref><br /> {{kbd | key = <nowiki>(?=.*狐狸)(?=.*叫).*</nowiki>}}</td>
<td>
<span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br />
<span style="background:#C6E3FF">What Does the Fox Say? 12 不叫狐狸 34</span><br />
What Does the shiba inu say? 柴犬怎叫
</td>
<td></td>
<td></td>
</tr>
<tr>
<td> 布林邏輯 OR: 包含「狐狸」或「叫」的行 ([http://regexr.com/3aoko demo])<br /> {{kbd | key = <nowiki>.*(狐狸|叫).*</nowiki>}}</td>
<td>
<span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34<br />
What Does the shiba inu say? 柴犬怎叫</span><br />
What Does the shiba inu say? 柴犬怎了
</td>
<td>布林邏輯: 不包含「狐狸」也不包含「柴犬」的行<br /> {{kbd | key = <nowiki>^((?!狐狸|柴犬).)*$</nowiki>}}</td>
<td>What Does the Fox Say? 12 狐狸怎叫 34<br />
What Does the shiba inu say? 柴犬怎叫<br />
<span style="background:#C6E3FF">What Does the Husky say? 哈士奇怎叫 </span></td>
</tr>
<tr>
<td> 布林邏輯 NOT: 不包含「狐狸」,但包含「柴犬」的行 ([http://regexr.com/3aokr demo])<ref>[http://stackoverflow.com/questions/2953039/regular-expression-for-a-string-containing-one-word-but-not-another regex - Regular expression for a string containing one word but not another - Stack Overflow]</ref><br /> {{kbd | key = <nowiki>^((?!狐狸).)*(柴犬).*$</nowiki>}} = {{kbd | key = <nowiki>^(柴犬).*((?!狐狸).)*$</nowiki>}} = {{kbd | key = <nowiki>(柴犬).*((?!狐狸).)*</nowiki>}}</td>
<td>
What Does the Fox Say? 12 狐狸怎叫 34<br />
<span style="background:#C6E3FF">What Does the shiba inu say? 柴犬怎叫</span>
</td>
<td></td>
<td></td>
</tr>
</table>


== Regular expression online tools ==
* [http://regex101.com/ Online regex tester and debugger: JavaScript, Python, PHP, and PCRE] ([http://regex101.com/r/tH1eT7/1 example]) {{Gd}} 有提供語法解說
* [http://www.rubular.com/ Rubular]: a Ruby regular expression editor and tester ([http://www.rubular.com/r/UZuUT5pjeh example])
* [http://gskinner.com/RegExr/ RegExr]: Learn, Build, & Test RegEx ([http://regexr.com/395t0 example]). {{Gd}} 有提供語法解說. 教學: [http://blog.hsdn.net/1426.html RegExr: 功能強大的正規式撰寫協助工具]
* [http://www.phpliveregex.com/ PHP Live Regex] {{access | date=2014-11-25}}
* [http://www.gethifi.com/tools/regex HiFi Regex Tester - Live JavaScript Regular Expression Tester] for Javascript {{access | date=2014-12-23}}
* [http://www.regextester.com/ Regex Tester and Debugger Online - Javascript, PCRE, PHP] {{access | date=2016-01-07}}
* [http://rocksaying.tw/archives/2670695.html Regular Expression (RegExp) in JavaScript - 石頭閒語] {{access | date=2017-11-14}}


examples
== Quick Reference Table ==
* {{Gd}} [http://regexlib.com/ Regular Expression Library] 網友提供的 pattern 範例


== cases ==
Note: (1) Blue highlighted areas in samples represent text matching the rules, (2) The same text rule can have multiple representations
=== 將Email清單,轉成Email軟體可以使用的寄信名單 (取代換行符號為逗號) ===
 
<pre>
{| class="wikitable"
|-
aaa@email.com
! Text Rule
bbb@email.com
! Sample
ccc@email.com
! Opposite Text Rule
! Sample
|-
| Any single character (including spaces, but not newline) <br> <code>.</code>
| <span style="background:#C6E3FF">W</span>hat Does the Fox Say? 12 狐狸怎叫 34
|
|
|-
| Any character (including spaces), appears 1 or 0 times <br> <code>.?</code> = <code>.{0,1}</code>
| <span style="background:#C6E3FF">W</span>hat Does the Fox Say? 12 狐狸怎叫 34
|
|
|-
| Any number of multiple characters (including spaces) <br> <code>.*</code> = <code>.{0,}</code>
| <span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span>
|
|
|-
| Any number of characters (including spaces), at least 1 occurrence <br> <code>.+</code> = <code>.{1,}</code>
| <span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span>
|
|
|-
| Any number of spaces or newlines (at least 1 occurrence) <br> <code>\s+</code>
| What<span style="background:#C6E3FF"> </span>Does the Fox Say? 12 狐狸怎叫 34
| Any number of characters (not including spaces or newlines) <br> <code>[^\s]+</code> = <code>[^\s]{1,}</code> = <code>[\S]+</code> = <code>[^ ]+</code>
| <span style="background:#C6E3FF">What</span> Does the Fox Say? 12 狐狸怎叫 34
|-
| Any number of ASCII characters (including English, numbers and spaces) <br> <code>[\x00-\x80]+</code> or <code>[[:ascii:]]+</code>
| <span style="background:#C6E3FF">What Does the Fox Say? 12</span> 狐狸怎叫 34
| Non-ASCII, i.e., Chinese characters appearing any number of times <br> <code>[^\x00-\x80]+</code>
| What Does the Fox Say? 12 <span style="background:#C6E3FF">狐狸怎叫</span> 34
|-
| Any number of uppercase/lowercase English letters, numbers and underscore (_) (not including spaces) <br> <code>[\w]+</code> = <code>[a-zA-Z0-9_]+</code> <br> PHP with <code>u</code> modifier supports Chinese characters
| <span style="background:#C6E3FF">What</span> <span style="background:#C6E3FF">Does</span> <span style="background:#C6E3FF">the</span> <span style="background:#C6E3FF">Fox</span> <span style="background:#C6E3FF">Say</span>? <span style="background:#C6E3FF">12</span> 狐狸怎叫 <span style="background:#C6E3FF">_34</span>
| Any number of characters that are not English letters, numbers and underscore (_) <br> <code>\W+</code> = <code>[^a-zA-Z0-9_]+</code>
|
|-
| Any number of digits (not including spaces) <br> <code>[\d]+</code> = <code>[0-9]+</code>
| What Does the Fox Say? <span style="background:#C6E3FF">12</span> 狐狸怎叫 34
| Any number of characters not including digits (including spaces) <br> <code>[^\d]+</code> = <code>[^0-9]+</code> = <code>\D+</code>
| <span style="background:#C6E3FF">What Does the Fox Say? </span>12 狐狸怎叫 34
|-
| Any number of Chinese characters <br> <code>[\p{Han}]+</code>
| What Does the Fox Say? 12 <span style="background:#C6E3FF">狐狸怎叫</span> 34
| Any number of characters not including Chinese <br> <code>[^\p{Han}]+</code>
|
|-
| Lines starting with “狐狸” <br> <code>^狐狸.*$</code>
| <span style="background:#C6E3FF">狐狸怎叫 34 What Does the Fox Say?</span><br>柴犬怎叫 What Does the shiba inu say?
| Lines not starting with “狐狸” <br> <code>^(?!狐狸).*$</code>
| 狐狸怎叫 34 What Does the Fox Say?<br><span style="background:#C6E3FF">柴犬怎叫 What Does the shiba inu say?</span>
|-
| Lines ending with “怎叫” <br> <code>^.*怎叫$</code>
| What Does the Fox Say? 12 狐狸怎叫 34<br><span style="background:#C6E3FF">What Does the shiba inu say? 柴犬怎叫</span>
| Lines not ending with “怎叫” <br> <code>.*(?&lt;!怎叫)$</code>
| <span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br>What Does the shiba inu say? 柴犬怎叫
|-
| Lines containing “狐狸” <br> <code>^.*狐狸.*$</code> or <code>(狐狸)</code>
| <span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br>What Does the shiba inu say? 柴犬怎叫
| Lines not containing “狐狸” <br> <code>^((?!狐狸).)*$</code>
| What Does the Fox Say? 12 狐狸怎叫 34<br><span style="background:#C6E3FF">What Does the shiba inu say? 柴犬怎叫</span>
|-
| Boolean logic AND: Lines containing both “狐狸” and “叫” <br> <code>(?=.*狐狸)(?=.*叫).*</code> or <code>狐狸.*叫\|叫.*狐狸</code>
| <span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br><span style="background:#C6E3FF">What Does the Fox Say? 12 不叫狐狸 34</span><br>What Does the shiba inu say? 柴犬怎叫
|
|
|-
| Boolean logic OR: Lines containing “狐狸” or “叫” <br> <code>.*(狐狸\|叫).*</code>
| <span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34<br>What Does the shiba inu say? 柴犬怎叫</span><br>What Does the shiba inu say? 柴犬怎了
| Boolean logic: Lines not containing “狐狸” and not containing “柴犬” <br> <code>^((?!狐狸\|柴犬).)*$</code>
| What Does the Fox Say? 12 狐狸怎叫 34<br>What Does the shiba inu say? 柴犬怎叫<br><span style="background:#C6E3FF">What Does the Husky say? 哈士奇怎叫</span>
|-
| Boolean logic NOT: Lines not containing “狐狸” but containing “柴犬” <br> <code>^((?!狐狸).)*(柴犬).*$</code> = <code>^(柴犬).*((?!狐狸).)*$</code> = <code>(柴犬).*((?!狐狸).)*</code>
| What Does the Fox Say? 12 狐狸怎叫 34<br><span style="background:#C6E3FF">What Does the shiba inu say? 柴犬怎叫</span>
|
|
|}
 
 
== Regular Expression Online Tools ==
 
Websites for testing regular expression syntax:
* {{Gd}} [http://regex101.com/ RegEx101] - “Online regex tester and debugger: PHP, PCRE, Python, Golang and JavaScript” - Provides syntax explanations
* {{Gd}} [http://gskinner.com/RegExr/ RegExr] - Learn, Build, &amp; Test RegEx - Provides syntax explanations
* [https://regexper.com/ Regexper] - Visual explanation of syntax using diagrams
* [https://jex.im/regulex/ Regulex:JavaScript Regular Expression Visualizer] - JavaScript Regular Expression Visualizer - Visual explanation using diagrams
* [http://www.rubular.com/ Rubular] - A Ruby regular expression editor and tester
* [http://www.phpliveregex.com/ PHP Live Regex]
* [http://www.regextester.com/ Regex Tester and Debugger Online] - JavaScript, PCRE, PHP


改成
</pre>


==== 方案1: Sublime Text, EmEditor ====
語法適用 [http://www.sublimetext.com/ Sublime Text], [http://www.emeditor.com/ EmEditor]軟體 (以下為 EmEditor 的操作說明)
# Menu: Search -> Replace
# click "Use Regular Expression"
## Find: {{kbd | key = <nowiki>\n</nowiki>}} ([https://zh.wikipedia.org/wiki/%E6%8F%9B%E8%A1%8C 換行符號] 。{{Win}} 作業系統的換行符號是 {{kbd | key = <nowiki>\r\n</nowiki>}}、{{Mac}} 作業系統的換行符號是 {{kbd | key = <nowiki>\n</nowiki>}},取兩者共有的符號。如果使用 {{Linux}} 作業系統的換行符號是 {{kbd | key = <nowiki>\r</nowiki>}}。 )
## Replace with: {{kbd | key = <nowiki>, </nowiki>}}
# click "Replace all"


==== 方案2: Notepad++ ====
== Common Use Cases ==
使用[http://notepad-plus-plus.org/ Notepad++]軟體
# 選單: 尋找 -> 取代
# 搜尋模式: 勾選「增強模式」 (不是勾選「用類型表式」)
## 尋找目標: {{kbd | key = <nowiki>\n</nowiki>}} (換行符號)
## 取代成: {{kbd | key = <nowiki>, </nowiki>}}
# 勾選全部取代


相關資料: [http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Replacing_Newlines How To Replace Line Ends, thus changing the line layout] last visited: 2010-01-27


==== 方案3: Microsoft Word ====
=== Replace Newlines with Commas ===
使用Microsoft Word 2002軟體
# 選單: 編輯 -> 取代
# 勾選增強模式
## 尋找目標: {{kbd | key = <nowiki>^p</nowiki>}} (段落標記)
## 取代為: {{kbd | key = <nowiki>, </nowiki>}}
# 勾選全部取代


==== 方案4: Sed command for linux ====
Converting email lists into a format usable by email software:


{{kbd | key=<nowiki>sed 's/要被取代的字串/新的字串/g' old.filename > new.filename</nowiki>}}<ref>[http://linux.vbird.org/linux_basic/0330regularex.php#sed_replace 鳥哥的 Linux 私房菜 -- 正規表示法 (regular expression, RE) 與文件格式化處理]</ref>
<pre>Original:
aaa@email.com
bbb@email.com
ccc@email.com


(1)要被取代的字串: :a;N;$!ba;s/\n
Convert to:
(2)新的字串: ;


{{kbd | key=<nowiki>sed ':a;N;$!ba;s/\n/; /g' old.filename > new.filename</nowiki>}} <ref>參考 [http://stackoverflow.com/questions/1251999/sed-how-can-i-replace-a-newline-n unix - sed: How can I replace a newline? ]</ref>
==== Method 1: Sublime Text, EmEditor ====


# Menu: Search -&gt; Replace
# Check “Use Regular Expression”
#* Find: <code>\n</code> (newline character)
#* Replace with: <code>,</code>
# Click “Replace all”




<div style="float: left; width: 100%; position: relative; display: block; clear: left;">
==== Method 2: Notepad++ ====
<div style="width: 46%;  float: left; margin:0 auto; position: relative; display: block; ">
==== 將每行的文字,移除換行,並且都加上逗號分隔 ====
<pre>
// before
Elmo
Emie
Granny Bird


// after
# Menu: Find -&gt; Replace
Elmo, Emie, Granny Bird
# Search mode: Check “Extended mode” (not “Regular expression”)
</pre>
#* Find: <code>\n</code>
方法: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor]。
#* Replace with: <code>,</code>
* Find what: {{kbd | key = <nowiki>\n</nowiki>}}
# Click “Replace All”
* Replace with: {{kbd | key = <nowiki>, </nowiki>}} 此例是將每行的文字,都加上逗號+空格分隔 (如果要用別的符號分隔,例如頓號分隔,則是 Replace with: {{kbd | key = <nowiki>、</nowiki>}})


</div>


==== Method 3: Microsoft Word ====


<div style="width: 46%; float: left; margin:0 auto; position: absolute; display: block; left: 54%; top: 0;">
# Menu: Edit -&gt; Replace
# Check extended mode
#* Find: <code>^p</code> (paragraph mark)
#* Replace with: <code>,</code>
# Click “Replace All”


==== 將逗號分隔的文字,還原成逐行顯示,並且移除分隔符號 (,) ====
==== Method 4: Sed command for Linux ====
<pre>
// before
Elmo, Emie, Granny Bird


// after
<syntaxhighlight lang="bash">sed ':a;N;$!ba;s/\n/; /g' old.filename > new.filename</syntaxhighlight>
Elmo
Emie
Granny Bird
</pre>
方法: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor]。{{exclaim}} 輸出結果的每行前面可能會有空白
* Find what: {{kbd | key = <nowiki>([^,]+),</nowiki>}}
* Replace with: {{kbd | key = <nowiki>\1\n</nowiki>}}


</div>
=== Find IP Addresses (IPv4) ===
</div>


<div style="clear:both;">&nbsp;</div>
For Notepad++ v.5.9.5: - Find: <code>\d\d?\d?\.\d\d?\d?\.\d\d?\d?\.\d\d?\d?</code>


=== Find IP address ===
For Sublime Text v. 3.2.21: - Find: <code>(?:\d{1,3}\.){3}\d{1,3}</code>
使用[http://notepad-plus-plus.org/ Notepad++]軟體 v.5.9.5
# 選單: 尋找 -> 取代
# 搜尋模式: 勾選「用類型表式」
## 尋找目標: \d\d?\d?\.\d\d?\d?\.\d\d?\d?\.\d\d?\d?


note: not support {n} syntax
=== Remove Black Squares (UNIX Line Endings LF) ===


參考資料:  
Using Notepad++: 1. Menu: Find -&gt; Replace 2. Search mode: Check “Extended mode” - Find: <code>\n\n</code> (2 LF characters) - Replace with: <code>\r\n</code> (CR and LF)
* [http://sourceforge.net/projects/notepad-plus/forums/forum/331754/topic/4780602 SourceForge.net: Notepad++: Regular expression for IP addresses]
* [http://stackoverflow.com/questions/53497/regular-expression-that-matches-valid-ipv6-addresses regex - Regular expression that matches valid IPv6 addresses - Stack Overflow] {{access | date = 2015-08-10}}


=== 移除記事本純文字檔的黑色方塊(UNIX系統的換行符號 LF ) ===
=== Add Quotes Around Elements ===
使用notepad++軟體
# 選單: 尋找 -> 取代
# 搜尋模式: 勾選「增強模式」
## 尋找目標: \n\n  (註: 2個LF )
## 取代成: \r\n  (註: CR與LF )


用記事本打開純文字檔時,就不會看到黑色方塊
==== Add Quotes Around Array Elements ====


<pre>Before: Elmo, Emie, Granny Bird, Herry Monster, 喀喀獸
After: 'Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸'</pre>
'''Method 1: PHP'''


=== 將每項元素,加上引號框起來 ===
<syntaxhighlight lang="php">$users = array('Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸');
==== 將陣列的每項元素,都加上引號框起來 ====
// Single quotes around each element
<pre>
Elmo, Emie, Granny Bird, Herry Monster, 喀喀獸
修改成
'Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸'
</pre>
方法1: 使用 PHP
{{exclaim}} 如果元素包含換行符號,不能用下面方法處理。
<pre>
$users = array('Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸');
//「單引號」相隔每個元素
$result = implode(",", preg_replace('/^(.*?)$/', "'$1'", $users));
$result = implode(",", preg_replace('/^(.*?)$/', "'$1'", $users));
// Double quotes around each element
$result = implode(",", preg_replace('/^(.*?)$/', "\"$1\"", $users));
echo $result;</syntaxhighlight>
'''Method 2: Sublime Text or EmEditor''' - Find: <code>([^\s|,]+)</code> - Replace with: <code>'\1'</code> (for single quotes) or <code>&quot;\1&quot;</code> (for double quotes)
'''Method 3: Notepad++''' (Enable “Regular expression” search mode) - Find: <code>([^\s|,]+)</code> - Replace with: <code>'$1'</code> (for single quotes) or <code>&quot;$1&quot;</code> (for double quotes)
=== Find Non-ASCII Characters (Chinese/Non-English Text) ===


//「雙引號」相隔每個元素
==== In LibreOffice ====
$result = implode(",", preg_replace('/^(.*?)$/', "\"$1\"", $users));
echo $result;
</pre>


Thanks, Joshua! More on [http://melikedev.com/2010/02/24/php-wrap-implode-array-elements-in-quotes/ PHP - Wrap Implode Array Elements in Quotes » Me Like Dev]
<pre>[^\u0000-\u0080]+</pre>


方法2: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor]
* Find: {{kbd | key = <nowiki>([^\s|,]+)</nowiki>}}
* 分隔符號
**「單引號」相隔每個元素 Replace with: {{kbd | key = <nowiki>'\1'</nowiki>}}
**「雙引號」相隔每個元素 Replace with: {{kbd | key = <nowiki>"\1"</nowiki>}}


方法3: 使用 [https://notepad-plus-plus.org/ Notepad++]。啟用搜尋模式的「用類型表式」
==== Find Chinese Characters in Google Sheets ====
* Find: {{kbd | key = <nowiki>([^\s|,]+)</nowiki>}}
* 分隔符號
**「單引號」相隔每個元素 Replace with: {{kbd | key = <nowiki>'$1'</nowiki>}}
**「雙引號」相隔每個元素 Replace with: {{kbd | key = <nowiki>"$1"</nowiki>}}


Example: If cell {{kbd | key=A2}} contains any Chinese character, display “Chinese”, otherwise display “English”:


<div style="float: left; width: 100%; position: relative; display: block; clear: left;">
<pre>=IF(REGEXMATCH(A2, &quot;[\一-\龥]&quot;), &quot;Chinese&quot;, &quot;English&quot;)</pre>
<div style="width: 46%;  float: left; margin:0 auto; position: relative; display: block; ">


==== 將每行的文字,都加上引號框起來,並且移除換行 ====
==== Find Non-ASCII Characters in Google Sheets ====
Extract non-ASCII characters (such as Chinese, Japanese, emoji, etc.) from cell {{kbd | key=A2}}
<pre>
<pre>
// before
=IF(ISERROR(REGEXEXTRACT(A2, "[^\x00-\x80]+")), "", REGEXEXTRACT(A2, "[^\x00-\x80]+"))
Elmo
</pre>
Emie
 
Granny Bird
Explanation of regular expression {{kbd | key=<nowiki>[^\x00-\x80]+</nowiki>}}


// after
* {{kbd | key=<nowiki>[\x00-\x80]</nowiki>}}: Represents the ASCII character range (character codes 0-128). (1) Standard ASCII range: 0-127 ({{kbd | key=<nowiki>0x00-0x7F</nowiki>}} aka * {{kbd | key=<nowiki>[\x00-\x7F]</nowiki>}})<ref>[https://www.commfront.com/pages/ascii-chart ASCII Chart – CommFront]</ref> (2) Character 128 (({{kbd | key=<nowiki>0x80</nowiki>}}) is actually the first character in the extended ASCII range, not part of the original ASCII standard.<ref>[https://en.wikipedia.org/wiki/UTF-8 UTF-8 - Wikipedia]</ref><ref>[https://en.wikipedia.org/wiki/Control_character Control character - Wikipedia]</ref>
'Elmo', 'Emie', 'Granny Bird'
* {{kbd | key=<nowiki>[^...]</nowiki>}}: Means "not" these characters
</pre>
* {{kbd | key=<nowiki>+</nowiki>}}: Means one or more
方法1: 使用 [http://www.sublimetext.com/ Sublime Text] [https://zh-tw.emeditor.com/ EmEditor]。該方法有處理每行的前面或後面可能有一格或多格空白
* Find what: {{kbd | key = <nowiki>(\S+)(\s?)+$\n</nowiki>}}
* Replace with: {{kbd | key = <nowiki>'\1', </nowiki>}}


方法2: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor] {{exclaim}} 該方法沒有處理每行的後面可能有一格或多格空白
Overall meaning: Matches one or more non-ASCII characters
* Find what: {{kbd | key = <nowiki>(.*)$\n</nowiki>}} 或 {{kbd | key = <nowiki>(\S+)$\n</nowiki>}} 或 {{kbd | key = <nowiki>(\S+)\n</nowiki>}}
* Replace with: {{kbd | key = <nowiki>'\1', </nowiki>}}


</div>
==== Find Chinese Characters in MySQL ====


Find rows where <code>column_name</code> contains Chinese characters:


<div style="width: 46%; float: left; margin:0 auto; position: absolute; display: block; left: 54%; top: 0;">
<pre lang="sql">SELECT `column_name`
FROM `table_name`
WHERE HEX(`column_name`) REGEXP '^(..)*(E[4-9])';</pre>


==== 將引號框起來的文字,還原成逐行顯示,並且移除分隔符號 (,) ====
Query condition used to match records where the <code>column_name</code> field contains only Chinese characters.
<pre>
<pre lang="sql">SELECT `column_name`
// before
FROM `table_name`
'Elmo', 'Emie', 'Granny Bird'
WHERE `column_name` REGEXP '^[一-龯]+$';</pre>


// after
Explanation:
Elmo
* {{kbd | key=<nowiki>[一-龯]</nowiki>}} - Character set that matches all characters from "一" to "龯" in Unicode
Emie
* "一" has Unicode code point {{kbd | key=<nowiki>U+4E00</nowiki>}}<ref>[https://www.compart.com/en/unicode/U+4E00 “一” U+4E00 CJK Unified Ideograph-4E00 Unicode Character]</ref>
Granny Bird
* "龯" has Unicode code point {{kbd | key=<nowiki>U+9FEF</nowiki>}}<ref>[https://www.compart.com/en/unicode/U+9FAF “龯” U+9FAF CJK Unified Ideograph-9FAF Unicode Character]</ref>
</pre>
* This range U+4E00-U+9FFF already covers over 99% of daily Chinese usage requirements [https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_Extension_B Extension B] and later blocks mainly contain ancient Chinese characters, variant characters, etc., which rarely appear in modern texts
方法: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor]。該方法有處理每行的前面或後面可能有一格或多格空白
* Find what: {{kbd | key = <nowiki>'(([^,|^'])+)',?\s?</nowiki>}}
* Replace with: {{kbd | key = <nowiki>\1\n</nowiki>}}


</div>
==== Find Non-ASCII Characters in MySQL ====
</div>


<div style="clear:both;">&nbsp;</div>
Find rows where <code>column_name</code> is not entirely ASCII characters:


=== 取代非英文的文字 ===
<syntaxhighlight lang="sql">SELECT `column_name`
適用: Google Drive 的 RegExReplace 函數、Notepad++的搜尋
FROM `table_name`
<pre>
WHERE `column_name` <> CONVERT(`column_name` USING ASCII)</syntaxhighlight>
[^\x00-\x80]+
</pre>


適用: Total commander 的 Multi-Rename tool<ref>取代非英文的文字,但是不包含 . 符號: <nowiki>[^\u0000-\u0080|.]+ </nowiki></ref>
==== Find Chinese Characters in PHP ====
<pre>
[^\u0000-\u0080]+
</pre>


參考資料: [http://stackoverflow.com/questions/150033/regular-expression-to-match-non-english-characters javascript - Regular expression to match non-english characters? - Stack Overflow]
'''Exact match:'''


=== 將每行文字的行頭加上逗號符號 ===
<syntaxhighlight lang="php">// Approach 1
使用notepad++軟體
if (preg_match('/^[\x{4e00}-\x{9fa5}]+$/u', $string)) {
# 選單: 尋找 -> 取代
    echo "All text is Chinese characters" . PHP_EOL;
# 搜尋模式: 勾選「用類型表示」
} else {
## 尋找目標: {{kbd | key=(.*)}} 或者是 {{kbd | key=^(.*)$}}
    echo "Some text is not Chinese characters" . PHP_EOL;
## 取代成: {{kbd | key=,\1}} 或者是 {{kbd | key=,$1}}
}


參考資料: [http://stackoverflow.com/questions/8413237/notepad-regex-search-replace-how-to-append-and-prepend-a-character-at-start-a Notepad++ RegEx Search/Replace: How to append and prepend a character at start and end of each file line? - Stack Overflow]
// Approach 2
if (preg_match('/^[\p{Han}]+$/u', $string)) {
    echo "All text is Chinese characters" . PHP_EOL;
} else {
    echo "Some text is not Chinese characters" . PHP_EOL;
}</syntaxhighlight>
'''Partial match:'''


=== 知道前面跟後面的文字,但是中間文字忘記了 ===
<syntaxhighlight lang="php">// Approach 1
使用notepad++軟體
$string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐';
# 選單: 尋找 -> 取代
$pattern = '/[\p{Han}]+/u';
# 搜尋模式: 勾選「用類型表示」
preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE);
## 尋找目標: {{kbd | key=a(.*)le}} 就可以找到(1)apple (2)apps lesson ... 等a開頭、le結尾的文字,中間可夾雜空白。 {{exclaim}} 中文字串搜尋,建議將文件的編碼改成 UTF-8 編碼
var_dump($matches);


// Approach 2
$string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐';
$pattern = '/[\x{4e00}-\x{9fa5}]+/u';
preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE);
var_dump($matches);</syntaxhighlight>


=== 移除空白行 ===
=== Find ASCII Characters in PHP ===
<pre>
# (原) 每行可能間隔一行空白或多行空白
尼歐
崔妮蒂


莫斐斯
'''Code I:'''


<syntaxhighlight lang="php">if (preg_match('/[^\x20-\x7f]/', $keyword) === 0) {
    echo "The keyword is ASCII only";
} else {
    echo "The keyword contains non-ASCII characters (like Chinese, Japanese, etc.)";
}</syntaxhighlight>
'''Code II:'''


史密斯
<syntaxhighlight lang="php">$pattern = '/^[[:ascii:]]+$/i';
祭師
$text = "Hello World"; // ASCII only
if (preg_match($pattern, $text)) {
    echo "Pure ASCII characters";
} else {
    echo "Contains non-ASCII characters";
}</syntaxhighlight>


# (後) 改成每行逐行緊接著
=== Remove Empty Lines ===
尼歐
崔妮蒂
莫斐斯
史密斯
祭師
</pre>


移除一行空白或多行空白( 行內可能包含一個或多個空白字元 {{kbd | key= SPACE}} 、定位鍵{{kbd | key= TAB}})
'''Original:'''
* 使用工具: 適用 Sublime Text 與 EmEditor 軟體,需勾選「使用規則運算式」。{{exclaim}} 以下語法不適用於 Notepad++ 軟體<ref>[http://www.sitepoint.com/forums/showthread.php?448843-Regex-delete-multiple-blank-lines Regex: delete multiple blank lines]</ref>
** 尋找: {{kbd | key=<nowiki>^[\s\t]*$\n</nowiki>}} --> 取代為: 空 (不需要輸入任何字)
* 使用工具: Notepad++
** Notepad++ 軟體選單: 編輯 -> 行列 -> 移除空行(含空白字元 {{kbd | key= SPACE}} )<ref>[http://stackoverflow.com/questions/3866034/removing-empty-lines-in-notepad regex - Removing empty lines in Notepad++ - Stack Overflow]</ref>
* 詳細說明,請見 [[Regular replace blank lines]]


=== 尋找非空白的文字 ===
<pre>Neo
* 尋找: {{kbd | key=<nowiki>[^\s]+</nowiki>}} [https://regex101.com/r/zH7wV3/1 online demo]
Trinity


=== 將特定符號相隔的文字,改成逐行顯示 ===
Morpheus
例子:
<pre>
# (原) 頓號(、)符號相隔的文字
尼歐、莫斐斯、崔妮蒂、史密斯、祭師


# (後) 改成逐行顯示
尼歐
莫斐斯
崔妮蒂
史密斯
祭師
</pre>


使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor]
Smith
* Find: {{kbd | key = <nowiki>([^、]+)([、]{1})</nowiki>}}
Oracle</pre>
* Replace with: {{kbd | key = <nowiki>\1\n</nowiki>}}
'''After:'''


語法說明
<pre>Neo
* <nowiki>[^、]</nowiki> : 符合任意字,但不是頓號(、)的文字
Trinity
* <nowiki>[^、]+</nowiki> : 一次以上不是頓號(、)的文字
Morpheus
* <nowiki>([^、]+)</nowiki> : 符合「一次以上不是頓號(、)的文字」規則的文字
Smith
* <nowiki>[、]</nowiki>: 出現頓號(、)任意次的文字
Oracle</pre>
* <nowiki>[]{1}</nowiki> : 出現頓號(、)一次的文字
'''Using Sublime Text &amp; EmEditor:''' - Find: <code>^[\s\t]*$\n</code> - Replace with: (empty)
* <nowiki>([、]{1})</nowiki> : 符合「出現頓號()一次的文字」規則的文字


'''Using Notepad++ v7.8.7:''' - Menu: Edit -&gt; Line Operations -&gt; Remove Empty Lines (Including Blank Lines)


=== 將每行文字的結尾處,加入空一格 (半形空白) ===
=== Find Non-Whitespace Text ===
法1: 適用軟體: Sublime Text, EmEditor
# Menu: Search -> Replace
# click "Use Regular Expression"
## Find: {{kbd | key = <nowiki>\n</nowiki>}}
## Replace with: {{kbd | key = <nowiki>_\n</nowiki>}}(符號 {{kbd | key = <nowiki>\n</nowiki>}} 前面的 _ 自行替換成半形空白)
# click "Replace all"


* Find: <code>[^\s]+</code>


法2: 適用軟體: Sublime Text, EmEditor
=== Convert Symbol-Separated Text to Line-by-Line Display ===
# Menu: Search -> Replace
# click "Use Regular Expression"
## Find: {{kbd | key = <nowiki>$</nowiki>}}
## Replace with: {{kbd | key = <nowiki>_$</nowiki>}}(符號 {{kbd | key = <nowiki>$</nowiki>}} 前面的 _ 自行替換成半形空白)
# click "Replace all"


'''Example:'''


{{exclaim}} 需要檢查最後一行是否是空白行,如果不是空白行,不會套用到該取代規則
<pre>Before: 尼歐、莫斐斯、崔妮蒂、史密斯、祭師
After:
尼歐
莫斐斯
崔妮蒂
史密斯
祭師</pre>
'''Using Sublime Text or EmEditor:''' - Find: <code>([^、]+)([、]{1})</code> - Replace with: <code>\1\n</code>


=== 將每行文字內夾雜的空白,取代成 Tab 符號 ===
=== Replace Multiple Spaces with Tab Characters ===
將原本空白間隔的欄位值,取代成 Tab鍵間隔的欄位值。輸出結果可以方便貼到 MS Excel 或 [[Google spreadsheet]]。
<pre># \t 代表是 Tab 鍵,又稱定位鍵
# before
aaa bbb    ccc


# after
'''Before:''' <code>aaa bbb    ccc</code> '''After:''' <code>aaa\tbbb\tccc</code>
aaa\tbbb\tccc
</pre>


說明: \S 代表非空白字元, \r\n 代表換行符號。[^\S\r\n] 則代表不是非空白字元、也不是換行符號。換句話說尋找空白,但不包含換行符號。
'''Using Sublime Text:''' - Find: <code>([^\S\n]+)</code> or <code>([^\S\r\n]+)</code> or <code>\s\s+</code> - Replace with: <code>\t</code>


使用  Sublime Text 軟體 (參考資料<ref>[http://www.techrepublic.com/blog/microsoft-office/quickly-replace-multiple-space-characters-with-a-tab-character/ Quickly replace multiple space characters with a tab character - TechRepublic]</ref> <ref>[http://stackoverflow.com/questions/3469080/match-whitespace-but-not-newlines-perl regex - Match whitespace but not newlines (Perl) - Stack Overflow]</ref>)
# Menu: Search -> Replace
# click "Use Regular Expression"
## Find: {{kbd | key = <nowiki>([^\S\n]+)</nowiki>}} 或 {{kbd | key = <nowiki>([^\S\r\n]+)</nowiki>}} 或 {{kbd | key = <nowiki>_{1,}</nowiki>}} ( 自行替換 _ 成半形空白)
## Replace with: {{kbd | key = <nowiki>\t</nowiki>}}
# click "Replace all"


=== 移除每行文字前後面可能多個的空白 ===
=== Remove Leading/Trailing Whitespace ===
==== 移除每行文字最前面可能多個的空白 ====
* 尋找: {{kbd | key = <nowiki>^\s+</nowiki>}} --> 取代為: 空白 (適用軟體: Sublime Text、EmEditor,需啟用 "Use Regular Expression" )


<pre># before
aaa
bbb
    ccc


# after
==== Remove Leading Whitespace ====
aaa
bbb
ccc
</pre>


* Find: <code>^\s+</code>
* Replace with: (empty)


==== 移除每行文字最後面可能多個的空白 ====
* 尋找: {{kbd | key = <nowiki>\s+$</nowiki>}} --> 取代為: 空白 (適用軟體: Sublime Text、EmEditor,需啟用 "Use Regular Expression" )


==== Remove Trailing Whitespace ====


==== 移除每行文字前面或後面可能多個的空白 ====
* Find: <code>\s+$</code>
* 尋找: {{kbd | key = <nowiki>(^\s+|\s+$)</nowiki>}} --> 取代為: 空白 (適用軟體: Sublime Text、EmEditor,需啟用 "Use Regular Expression" )
* Replace with: (empty)


=== 尋找包含不是數字,是文字的行 ===
預期每行資料都是數字,尋找包含不是數字,是文字的行
<pre>
[^\d|\n]
</pre>


=== 尋找 Hashtag ===
==== Remove Both Leading and Trailing Whitespace ====
[[Extract all hashtags from text]]


=== 尋找文章內容中的網址 ===
* Find: <code>(^\s+|\s+$)</code>
[[Regular extract url from text]]
* Replace with: (empty)


=== 尋找文章內容中的長數字 ===
[[Extract large number from text]]


== Search unmatched string ==
== Text Editors Supporting Regular Expressions ==
=== case: find un-commented console.log ===
original format: some lines contains un-commented [[Javascript debug]] information
<pre>
  console.log("un-commented debug information");


  //console.log("commented debug information");
Various text editors support regular expressions including: - Sublime Text - EmEditor - Notepad++ - Visual Studio Code - Atom - Vim/Neovim
</pre>


Search pattern: find not started with the / symbol before the string "console.log"


<pre>
== Syntax Reference ==
  [^/](console\.log)
</pre>


== Regular expression batch tools ==
* Newline character: <code>\r\n</code> (for Notepad++: Extended mode &amp; Regular expression mode)
'''multiple''' regular expression operations on the same file
* Tab character: <code>\t</code> (for Notepad++: Extended mode)
* {{Gd}} [https://github.com/facelessuser/RegReplace RegReplace] 執行多個取代命令 "Simple find and replace sequencer plugin for Sublime Text" Quoted from official webpage. {{access | date=2014-10-25}}
* Digits: <code>\d</code> (for Notepad++: Regular expression mode only)
* ''$'' [https://www.emeditor.com/text-editor-features/more-features/batch-replace/ EmEditor (Text Editor) - Batch Replace] & [https://zh-tw.emeditor.com/text-editor-features/coding/regular-expressions/ EmEditor (文字編輯器) | 規則運算式]
* Non-whitespace: <code>\S</code> - Does not include half-width spaces and full-width spaces


one regular expression operations on '''multiple''' files
== Troubleshooting Regular Expressions ==
* ''$''  [https://www.emeditor.com/text-editor-features/more-features/find-replace/ EmEditor (Text Editor) | Find and Replace]


== syntax ==
'''Tips:''' 1. Use online tools like regex101 to understand your syntax 2. Test with small data: Prepare small file data to verify syntax 3. Highlight or output matched text for debugging 4. Simplify the syntax when encountering issues 5. Try alternative syntax due to compatibility issues (e.g., <code>\d</code> to <code>[0-9]+</code>)
* 換行符號: \r\n (適用: Notepad++選項: 增強模式 & 用類型表式)
* tab鍵的固定空白分隔: \t  (適用: Notepad++選項: 增強模式)
* 數字: \d (適用: Notepad++選項: 用類型表式。{{exclaim}} 不適用: Notepad++選項: 增強模式)
* {{kbd | key=<nowiki>\S</nowiki>}} 非空白的文字: 不會含括半形空白與全行空白


== trouble shooting ==
* [http://errerrors.blogspot.com/2015/07/sublime-text-invalid-lookbehind.html Err: 解決 Sublime Text 正則表示式搜尋,遇到的「Invalid lookbehind assertion」錯誤]


== further reading ==
== Alternative Solutions ==
* [http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Searching_And_Replacing SourceForge.net: Searching And Replacing - notepad-plus], [http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Regular_Expressions SourceForge.net: Regular Expressions - notepad-plus]
* [http://stackoverflow.com/questions/23020856/text-extraction-with-sublime-text regex - text extraction with sublime text - Stack Overflow] {{access | date=2014-09-26}}
* [https://zh.wikipedia.org/wiki/%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F 正規表示式 - 維基百科,自由的百科全書]
* [http://www.regular-expressions.info/ Regular-Expressions.info - Regex Tutorial, Examples and Reference - Regexp Patterns]
* [http://linux.vbird.org/linux_basic/0320bash.php 鳥哥的 Linux 私房菜 -- 第十章、認識與學習BASH] {{access | date = 2016-06-08}}


unicode
* Use Tab-separated data that can be easily pasted into Google Sheets or MS Excel
* [http://www.regular-expressions.info/unicode.html Regex Tutorial - Unicode Characters and Properties] {{access | date = 2014-04-02}}
* Copy multiple rows and paste between different applications (compatibility varies)
* [http://php.net/manual/en/regexp.reference.unicode.php PHP: Unicode character properties - Manual] {{access | date = 2014-04-02}}


references
== Further Reading ==
<references/>


== 替代方案 ==
* Regular-Expressions.info - Regex Tutorial, Examples and Reference
* 將資料以 {{kbd |key=Tab}}來隔開,貼到Google Drive的Spreadsheet或MS Excel,會自動儲存到不同欄位。所以將需要處理的原始資料中,需要擷取的資料的前後,使用{{kbd |key=Tab}}來隔開,複製後貼到於Google Drive的Spreadsheet或MS Excel,就會自動儲存到不同欄位,方便做進一步處理。
* Unicode character properties documentation
* Platform-specific regular expression documentation


Copy multiple rows & paste
{{Template: Data factory flow}}
* Copy to dreamweaver from MS Excel 2002: ok
* Copy to dreamweaver from Google Docs: not ok {{exclaim}}
* Copy to MS Excel 2002 from Google Docs: ok


[[Category:Regular expression]] [[Category:Software]] [[Category:Programming]] [[Category:Data Science]] [[Category:Search]] [[Category:Text file processing]]
[[Category: Regular expression]]  
[[Category: Software]]  
[[Category: Programming]]  
[[Category: Data Science]]  
[[Category: Search]]  
[[Category: String manipulation]]
[[Category: Revised with LLMs]

Latest revision as of 11:55, 11 December 2025

When processing text files through regular expressions, you can quickly search for or replace strings that match specific rules. Processing is done on a line-by-line basis for string manipulation. Regular expressions are also known as regex, regexp, or pattern matching expressions.

🌐 Switch language: English, 漢字


Raise_hand.png Need Help? You can use the provided explanatory online tools to try debugging yourself.


Quick Reference Table[edit]

Note: (1) Blue highlighted areas in samples represent text matching the rules, (2) The same text rule can have multiple representations

Text Rule Sample Opposite Text Rule Sample
Any single character (including spaces, but not newline)
.
What Does the Fox Say? 12 狐狸怎叫 34
Any character (including spaces), appears 1 or 0 times
.? = .{0,1}
What Does the Fox Say? 12 狐狸怎叫 34
Any number of multiple characters (including spaces)
.* = .{0,}
What Does the Fox Say? 12 狐狸怎叫 34
Any number of characters (including spaces), at least 1 occurrence
.+ = .{1,}
What Does the Fox Say? 12 狐狸怎叫 34
Any number of spaces or newlines (at least 1 occurrence)
\s+
What Does the Fox Say? 12 狐狸怎叫 34 Any number of characters (not including spaces or newlines)
[^\s]+ = [^\s]{1,} = [\S]+ = [^ ]+
What Does the Fox Say? 12 狐狸怎叫 34
Any number of ASCII characters (including English, numbers and spaces)
[\x00-\x80]+ or ascii:+
What Does the Fox Say? 12 狐狸怎叫 34 Non-ASCII, i.e., Chinese characters appearing any number of times
[^\x00-\x80]+
What Does the Fox Say? 12 狐狸怎叫 34
Any number of uppercase/lowercase English letters, numbers and underscore (_) (not including spaces)
[\w]+ = [a-zA-Z0-9_]+
PHP with u modifier supports Chinese characters
What Does the Fox Say? 12 狐狸怎叫 _34 Any number of characters that are not English letters, numbers and underscore (_)
\W+ = [^a-zA-Z0-9_]+
Any number of digits (not including spaces)
[\d]+ = [0-9]+
What Does the Fox Say? 12 狐狸怎叫 34 Any number of characters not including digits (including spaces)
[^\d]+ = [^0-9]+ = \D+
What Does the Fox Say? 12 狐狸怎叫 34
Any number of Chinese characters
[\p{Han}]+
What Does the Fox Say? 12 狐狸怎叫 34 Any number of characters not including Chinese
[^\p{Han}]+
Lines starting with “狐狸”
^狐狸.*$
狐狸怎叫 34 What Does the Fox Say?
柴犬怎叫 What Does the shiba inu say?
Lines not starting with “狐狸”
^(?!狐狸).*$
狐狸怎叫 34 What Does the Fox Say?
柴犬怎叫 What Does the shiba inu say?
Lines ending with “怎叫”
^.*怎叫$
What Does the Fox Say? 12 狐狸怎叫 34
What Does the shiba inu say? 柴犬怎叫
Lines not ending with “怎叫”
.*(?<!怎叫)$
What Does the Fox Say? 12 狐狸怎叫 34
What Does the shiba inu say? 柴犬怎叫
Lines containing “狐狸”
^.*狐狸.*$ or (狐狸)
What Does the Fox Say? 12 狐狸怎叫 34
What Does the shiba inu say? 柴犬怎叫
Lines not containing “狐狸”
^((?!狐狸).)*$
What Does the Fox Say? 12 狐狸怎叫 34
What Does the shiba inu say? 柴犬怎叫
叫.*狐狸 What Does the Fox Say? 12 狐狸怎叫 34
What Does the Fox Say? 12 不叫狐狸 34
What Does the shiba inu say? 柴犬怎叫
叫).* What Does the Fox Say? 12 狐狸怎叫 34
What Does the shiba inu say? 柴犬怎叫

What Does the shiba inu say? 柴犬怎了
柴犬).)*$ What Does the Fox Say? 12 狐狸怎叫 34
What Does the shiba inu say? 柴犬怎叫
What Does the Husky say? 哈士奇怎叫
Boolean logic NOT: Lines not containing “狐狸” but containing “柴犬”
^((?!狐狸).)*(柴犬).*$ = ^(柴犬).*((?!狐狸).)*$ = (柴犬).*((?!狐狸).)*
What Does the Fox Say? 12 狐狸怎叫 34
What Does the shiba inu say? 柴犬怎叫


Regular Expression Online Tools[edit]

Websites for testing regular expression syntax:


Common Use Cases[edit]

Replace Newlines with Commas[edit]

Converting email lists into a format usable by email software:

Original:
[email protected]
[email protected]
[email protected]

Convert to:
[email protected],[email protected],[email protected]

Method 1: Sublime Text, EmEditor[edit]

  1. Menu: Search -> Replace
  2. Check “Use Regular Expression”
    • Find: \n (newline character)
    • Replace with: ,
  3. Click “Replace all”


Method 2: Notepad++[edit]

  1. Menu: Find -> Replace
  2. Search mode: Check “Extended mode” (not “Regular expression”)
    • Find: \n
    • Replace with: ,
  3. Click “Replace All”


Method 3: Microsoft Word[edit]

  1. Menu: Edit -> Replace
  2. Check extended mode
    • Find: ^p (paragraph mark)
    • Replace with: ,
  3. Click “Replace All”

Method 4: Sed command for Linux[edit]

sed ':a;N;$!ba;s/\n/; /g' old.filename > new.filename

Find IP Addresses (IPv4)[edit]

For Notepad++ v.5.9.5: - Find: \d\d?\d?\.\d\d?\d?\.\d\d?\d?\.\d\d?\d?

For Sublime Text v. 3.2.21: - Find: (?:\d{1,3}\.){3}\d{1,3}

Remove Black Squares (UNIX Line Endings LF)[edit]

Using Notepad++: 1. Menu: Find -> Replace 2. Search mode: Check “Extended mode” - Find: \n\n (2 LF characters) - Replace with: \r\n (CR and LF)

Add Quotes Around Elements[edit]

Add Quotes Around Array Elements[edit]

Before: Elmo, Emie, Granny Bird, Herry Monster, 喀喀獸
After: 'Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸'

Method 1: PHP

$users = array('Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸');
// Single quotes around each element
$result = implode(",", preg_replace('/^(.*?)$/', "'$1'", $users));
// Double quotes around each element
$result = implode(",", preg_replace('/^(.*?)$/', "\"$1\"", $users));
echo $result;

Method 2: Sublime Text or EmEditor - Find: ([^\s|,]+) - Replace with: '\1' (for single quotes) or "\1" (for double quotes)

Method 3: Notepad++ (Enable “Regular expression” search mode) - Find: ([^\s|,]+) - Replace with: '$1' (for single quotes) or "$1" (for double quotes)

Find Non-ASCII Characters (Chinese/Non-English Text)[edit]

In LibreOffice[edit]

[^\u0000-\u0080]+


Find Chinese Characters in Google Sheets[edit]

Example: If cell A2 contains any Chinese character, display “Chinese”, otherwise display “English”:

=IF(REGEXMATCH(A2, "[\一-\龥]"), "Chinese", "English")

Find Non-ASCII Characters in Google Sheets[edit]

Extract non-ASCII characters (such as Chinese, Japanese, emoji, etc.) from cell A2

=IF(ISERROR(REGEXEXTRACT(A2, "[^\x00-\x80]+")), "", REGEXEXTRACT(A2, "[^\x00-\x80]+"))

Explanation of regular expression [^\x00-\x80]+

  • [\x00-\x80]: Represents the ASCII character range (character codes 0-128). (1) Standard ASCII range: 0-127 (0x00-0x7F aka * [\x00-\x7F])[1] (2) Character 128 ((0x80) is actually the first character in the extended ASCII range, not part of the original ASCII standard.[2][3]
  • [^...]: Means "not" these characters
  • +: Means one or more

Overall meaning: Matches one or more non-ASCII characters

Find Chinese Characters in MySQL[edit]

Find rows where column_name contains Chinese characters:

SELECT `column_name`
FROM `table_name`
WHERE HEX(`column_name`) REGEXP '^(..)*(E[4-9])';

Query condition used to match records where the column_name field contains only Chinese characters.

SELECT `column_name`
FROM `table_name`
WHERE `column_name` REGEXP '^[一-龯]+$';

Explanation:

  • [一-龯] - Character set that matches all characters from "一" to "龯" in Unicode
  • "一" has Unicode code point U+4E00[4]
  • "龯" has Unicode code point U+9FEF[5]
  • This range U+4E00-U+9FFF already covers over 99% of daily Chinese usage requirements Extension B and later blocks mainly contain ancient Chinese characters, variant characters, etc., which rarely appear in modern texts

Find Non-ASCII Characters in MySQL[edit]

Find rows where column_name is not entirely ASCII characters:

SELECT `column_name`
FROM `table_name`
WHERE `column_name` <> CONVERT(`column_name` USING ASCII)

Find Chinese Characters in PHP[edit]

Exact match:

// Approach 1
if (preg_match('/^[\x{4e00}-\x{9fa5}]+$/u', $string)) {
    echo "All text is Chinese characters" . PHP_EOL;
} else {
    echo "Some text is not Chinese characters" . PHP_EOL;
}

// Approach 2
if (preg_match('/^[\p{Han}]+$/u', $string)) {
    echo "All text is Chinese characters" . PHP_EOL;
} else {
    echo "Some text is not Chinese characters" . PHP_EOL;
}

Partial match:

// Approach 1
$string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐';
$pattern = '/[\p{Han}]+/u';
preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE);
var_dump($matches);

// Approach 2
$string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐';
$pattern = '/[\x{4e00}-\x{9fa5}]+/u';
preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE);
var_dump($matches);

Find ASCII Characters in PHP[edit]

Code I:

if (preg_match('/[^\x20-\x7f]/', $keyword) === 0) {
    echo "The keyword is ASCII only";
} else {
    echo "The keyword contains non-ASCII characters (like Chinese, Japanese, etc.)";
}

Code II:

$pattern = '/^[[:ascii:]]+$/i';
$text = "Hello World"; // ASCII only
if (preg_match($pattern, $text)) {
    echo "Pure ASCII characters";
} else {
    echo "Contains non-ASCII characters";
}

Remove Empty Lines[edit]

Original:

Neo
Trinity

Morpheus


Smith
Oracle

After:

Neo
Trinity
Morpheus
Smith
Oracle

Using Sublime Text & EmEditor: - Find: ^[\s\t]*$\n - Replace with: (empty)

Using Notepad++ v7.8.7: - Menu: Edit -> Line Operations -> Remove Empty Lines (Including Blank Lines)

Find Non-Whitespace Text[edit]

  • Find: [^\s]+

Convert Symbol-Separated Text to Line-by-Line Display[edit]

Example:

Before: 尼歐、莫斐斯、崔妮蒂、史密斯、祭師
After:
尼歐
莫斐斯
崔妮蒂
史密斯
祭師

Using Sublime Text or EmEditor: - Find: ([^、]+)([、]{1}) - Replace with: \1\n

Replace Multiple Spaces with Tab Characters[edit]

Before: aaa bbb ccc After: aaa\tbbb\tccc

Using Sublime Text: - Find: ([^\S\n]+) or ([^\S\r\n]+) or \s\s+ - Replace with: \t


Remove Leading/Trailing Whitespace[edit]

Remove Leading Whitespace[edit]

  • Find: ^\s+
  • Replace with: (empty)


Remove Trailing Whitespace[edit]

  • Find: \s+$
  • Replace with: (empty)


Remove Both Leading and Trailing Whitespace[edit]

  • Find: (^\s+|\s+$)
  • Replace with: (empty)


Text Editors Supporting Regular Expressions[edit]

Various text editors support regular expressions including: - Sublime Text - EmEditor - Notepad++ - Visual Studio Code - Atom - Vim/Neovim


Syntax Reference[edit]

  • Newline character: \r\n (for Notepad++: Extended mode & Regular expression mode)
  • Tab character: \t (for Notepad++: Extended mode)
  • Digits: \d (for Notepad++: Regular expression mode only)
  • Non-whitespace: \S - Does not include half-width spaces and full-width spaces

Troubleshooting Regular Expressions[edit]

Tips: 1. Use online tools like regex101 to understand your syntax 2. Test with small data: Prepare small file data to verify syntax 3. Highlight or output matched text for debugging 4. Simplify the syntax when encountering issues 5. Try alternative syntax due to compatibility issues (e.g., \d to [0-9]+)


Alternative Solutions[edit]

  • Use Tab-separated data that can be easily pasted into Google Sheets or MS Excel
  • Copy multiple rows and paste between different applications (compatibility varies)

Further Reading[edit]

  • Regular-Expressions.info - Regex Tutorial, Examples and Reference
  • Unicode character properties documentation
  • Platform-specific regular expression documentation

Data factory flow

[[Category: Revised with LLMs]