Regular expression: Difference between revisions

From LemonWiki共筆
Jump to navigation Jump to search
mNo edit summary
 
(94 intermediate revisions by the same user not shown)
Line 1: Line 1:
處理文字檔時,可以快速地搜尋或取代符合特定規則的字串。以每行為單位,進行字串處理<ref>[http://linux.vbird.org/linux_basic/0330regularex.php 鳥哥的 Linux 私房菜 -- 正規表示法 (regular expression, RE) 與文件格式化處理]</ref>。 正規表示法 (Regular Expression),又稱正規表示式、正則表達式、正規表示法、正規運算式、規則運算式、常規表示法<ref>[https://zh.wikipedia.org/wiki/%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F 正規表示式 - 維基百科,自由的百科全書]</ref>。
When processing text files through regular expressions, you can quickly search for or replace strings that match specific rules. Processing is done on a line-by-line basis for string manipulation. Regular expressions are also known as regex, regexp, or pattern matching expressions.


{{Raise hand | text = 有問題嗎?可以利用提供解說的[[Regular_expression#Regular_expression_online_tools | 線上工具]],嘗試自己除錯。 也可以到[http://www.ptt.cc/bbs/RegExp/index.html 看板 RegExp 文章列表 - 批踢踢實業坊]或其他[[問答服務]]詢問。 }}
{{LanguageSwitcher | content = [[Regular expression | English]], [[Regular expression in Mandarin|漢字]]}}


== 快速查表 ==
{{Raise hand | text = '''Need Help?''' You can use the provided explanatory [[#regular-expression-online-tools|online tools]] to try debugging yourself. }}
說明: (1) sample 藍色網底處代表符合規則的文字、(2) 同一文字規則可以有多種表示法
<table border="1" style="width:100%">
<tr >
<th style="background-color: #E0E0E0;"> 文字規則 </th>
<th style="background-color: #E0E0E0; width:260px;"> sample </th>
<th style="background-color: #9c9ca3;"> 對立的文字規則 </th>
<th style="background-color: #9c9ca3; width:260px;"> sample</th>
</tr>
<tr>
<td> 任意一個文字(包含空白,但不包含換行符號) <br /> {{kbd | key = <nowiki>.</nowiki>}} </td>
<td><span style="background:#C6E3FF">W</span>hat Does the Fox Say? 12 狐狸怎叫 34</td>
<td></td>
<td></td>
</tr>
<tr>
<td> 任意文字(包含空白),出現1次或0次 <br /> {{kbd | key = <nowiki>.?</nowiki>}} = {{kbd | key = <nowiki>.{0,1}</nowiki>}}</td>
<td><span style="background:#C6E3FF">W</span>hat Does the Fox Say? 12 狐狸怎叫 34</td>
<td></td>
<td></td>
</tr>
<tr>
<td> 任意次的多個文字(包含空白) <br /> {{kbd | key = <nowiki>.*</nowiki>}} ={{kbd | key = <nowiki> .{0,}</nowiki>}}</td>
<td><span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span></td>
<td></td>
<td></td>
</tr>
<tr>
<td> 任意次的文字(包含空白),至少出現1次 <br /> {{kbd | key = <nowiki>.+</nowiki>}} = {{kbd | key = <nowiki>.{1,}</nowiki>}}</td>
<td><span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span></td>
<td></td>
<td></td>
</tr>
<tr>
<td> 任意次的空白或換行符號 (至少出現1次的空白或換行符號)  <br /> {{kbd | key = <nowiki>\s+</nowiki>}} </td>
<td>What<span style="background:#C6E3FF"> </span>Does the Fox Say? 12 狐狸怎叫 34</td>
<td>任意多個文字(不包含空白或換行符號) <br /> {{kbd | key = <nowiki>[^\s]+</nowiki>}} ={{kbd | key = <nowiki> [^\s]{1,}</nowiki>}} = {{kbd | key = <nowiki> [\S]+</nowiki>}} = {{kbd | key = <nowiki>[^ ]+</nowiki>}}</td>
<td><span style="background:#C6E3FF">What</span> Does the Fox Say? 12 狐狸怎叫 34</td>
</tr>
<tr>
<td> 任意次的 ASCII character(包含英文、數字和空白) [http://regexr.com/3aom2 demo]<ref>[http://www.asciitable.com/ understand]</ref> <br /> {{kbd | key = <nowiki>[\x00-\x80]+</nowiki>}}</td>
<td><span style="background:#C6E3FF">What Does the Fox Say? 12</span> 狐狸怎叫 34</td>
<td>非 ASCII,即中文出現任意次<br /> {{kbd | key = <nowiki>[^\x00-\x80]+</nowiki>}}</td>
<td>What Does the Fox Say? 12 <span style="background:#C6E3FF">狐狸怎叫</span> 34</td>
</tr>
<tr>
<td> 任意次的英文、數字和底線符號( _ )文字(不包含空白) <br /> {{kbd | key = <nowiki>[\w]+</nowiki>}} = {{kbd | key = <nowiki>[a-zA-Z0-9_]+</nowiki>}} </td>
<td><span style="background:#C6E3FF">What</span> Does the Fox Say? 12 狐狸怎叫 34</td>
<td> 任意次的不是英文、數字和底線符號( _ )的文字 <br /> {{kbd | key = <nowiki>\W+</nowiki>}} = {{kbd | key = <nowiki>[^a-zA-Z0-9_]+</nowiki>}}</td>
<td>[http://regexr.com/3bk4v demo]</td>
</tr>
<tr>
<td> 任意次的數字(不包含空白) <br /> {{kbd | key = <nowiki>[\d]+</nowiki>}} = {{kbd | key = <nowiki>[0-9]+</nowiki>}}</td>
<td>What Does the Fox Say? <span style="background:#C6E3FF">12</span> 狐狸怎叫 34</td>
<td>不包含數字的任意次文字(包含空白  <br /> {{kbd | key = <nowiki>[^\d]+</nowiki>}} = {{kbd | key = <nowiki>[^0-9]+</nowiki>}} = {{kbd | key = <nowiki>\D+</nowiki>}} </td>
<td><span style="background:#C6E3FF">What Does the Fox Say? </span>12 狐狸怎叫 34</td>
</tr>
<tr>
<td> 以「狐狸」開頭的行 <br /> {{kbd | key = <nowiki>^狐狸.*$</nowiki>}}<ref>[http://www.regular-expressions.info/completelines.html Regex Examples: Matching Whole Lines of Text That Satisfy Certain Requirements]</ref></td>
<td>
<span style="background:#C6E3FF">狐狸怎叫 34 What Does the Fox Say?</span><br />
柴犬怎叫 What Does the shiba inu say?
</td>
<td>不以「狐狸」開頭的行  <br /> {{kbd | key = <nowiki>^(?!狐狸).*$</nowiki>}}<ref>[http://stackoverflow.com/questions/406230/regular-expression-to-match-text-that-doesnt-contain-a-word regex - Regular expression to match text that *doesn't* contain a word? - Stack Overflow]</ref> </td>
<td>
狐狸怎叫 34 What Does the Fox Say?<br />
<span style="background:#C6E3FF">柴犬怎叫 What Does the shiba inu say?</span>
</td>
</tr>
<tr>
<td> 以「怎叫」結尾的行 <br /> {{kbd | key = <nowiki>^.*怎叫$</nowiki>}}
<td>
What Does the Fox Say? 12 狐狸怎叫 34<br />
<span style="background:#C6E3FF">What Does the shiba inu say? 柴犬怎叫</span>
</td>
<td>不以「怎叫」結尾的行  <br /> {{kbd | key = <nowiki>.*(?<!怎叫)$</nowiki>}}<ref>[http://stackoverflow.com/questions/16398471/regex-not-ending-with Regex not ending with - Stack Overflow]</ref></td>
<td>
<span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br />
What Does the shiba inu say? 柴犬怎叫
</td>
</tr>
<tr>
<td> 包含「狐狸」的行 <br /> {{kbd | key = <nowiki>^.*狐狸.*$</nowiki>}}</td>
<td>
<span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br />
What Does the shiba inu say? 柴犬怎叫
</td>
<td>不包含「狐狸」的行  <br /> {{kbd | key = <nowiki>^((?!狐狸).)*$</nowiki>}} </td>
<td>
What Does the Fox Say? 12 狐狸怎叫 34<br />
<span style="background:#C6E3FF">What Does the shiba inu say? 柴犬怎叫 </span>
</td>
</tr>
<tr>
<td> 布林邏輯 AND: 包含「狐狸」和「叫」的行 ([http://regexr.com/3aokl demo])<ref>[http://stackoverflow.com/questions/469913/regular-expressions-is-there-an-and-operator regex - Regular Expressions: Is there an AND operator? - Stack Overflow]</ref><br /> {{kbd | key = <nowiki>(?=.*狐狸)(?=.*叫).*</nowiki>}}</td>
<td>
<span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br />
<span style="background:#C6E3FF">What Does the Fox Say? 12 不叫狐狸 34</span><br />
What Does the shiba inu say? 柴犬怎叫
</td>
<td></td>
<td></td>
</tr>
<tr>
<td> 布林邏輯 OR: 包含「狐狸」或「叫」的行 ([http://regexr.com/3aoko demo])<br /> {{kbd | key = <nowiki>.*(狐狸|叫).*</nowiki>}}</td>
<td>
<span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34<br />
What Does the shiba inu say? 柴犬怎叫</span><br />
What Does the shiba inu say? 柴犬怎了
</td>
<td>布林邏輯: 不包含「狐狸」也不包含「柴犬」的行<br /> {{kbd | key = <nowiki>^((?!狐狸|柴犬).)*$</nowiki>}}</td>
<td>What Does the Fox Say? 12 狐狸怎叫 34<br />
What Does the shiba inu say? 柴犬怎叫<br />
<span style="background:#C6E3FF">What Does the Husky say? 哈士奇怎叫 </span></td>
</tr>
<tr>
<td> 布林邏輯 NOT: 不包含「狐狸」,但包含「柴犬」的行 ([http://regexr.com/3aokr demo])<ref>[http://stackoverflow.com/questions/2953039/regular-expression-for-a-string-containing-one-word-but-not-another regex - Regular expression for a string containing one word but not another - Stack Overflow]</ref><br /> {{kbd | key = <nowiki>^((?!狐狸).)*(柴犬).*$</nowiki>}} = {{kbd | key = <nowiki>^(柴犬).*((?!狐狸).)*$</nowiki>}} = {{kbd | key = <nowiki>(柴犬).*((?!狐狸).)*</nowiki>}}</td>
<td>
What Does the Fox Say? 12 狐狸怎叫 34<br />
<span style="background:#C6E3FF">What Does the shiba inu say? 柴犬怎叫</span>
</td>
<td></td>
<td></td>
</tr>
</table>


== Regular expression online tools ==
* [http://regex101.com/ Online regex tester and debugger: JavaScript, Python, PHP, and PCRE] ([http://regex101.com/r/tH1eT7/1 example]) {{Gd}} 有提供語法解說
* [http://www.rubular.com/ Rubular]: a Ruby regular expression editor and tester ([http://www.rubular.com/r/UZuUT5pjeh example])
* [http://gskinner.com/RegExr/ RegExr]: Learn, Build, & Test RegEx ([http://regexr.com/395t0 example]). {{Gd}} 有提供語法解說. 教學: [http://blog.hsdn.net/1426.html RegExr: 功能強大的正規式撰寫協助工具]
* [http://www.phpliveregex.com/ PHP Live Regex] {{access | date=2014-11-25}}
* [http://www.gethifi.com/tools/regex HiFi Regex Tester - Live JavaScript Regular Expression Tester] for Javascript {{access | date=2014-12-23}}
* [http://www.regextester.com/ Regex Tester and Debugger Online - Javascript, PCRE, PHP] {{access | date=2016-01-07}}
* [http://rocksaying.tw/archives/2670695.html Regular Expression (RegExp) in JavaScript - 石頭閒語] {{access | date=2017-11-14}}


examples
== Quick Reference Table ==
* {{Gd}} [http://regexlib.com/ Regular Expression Library] 網友提供的 pattern 範例


== cases ==
Note: (1) Blue highlighted areas in samples represent text matching the rules, (2) The same text rule can have multiple representations
=== 將Email清單,轉成Email軟體可以使用的寄信名單 (取代換行符號為逗號) ===
 
<pre>
{| class="wikitable"
|-
aaa@email.com
! Text Rule
bbb@email.com
! Sample
ccc@email.com
! Opposite Text Rule
! Sample
|-
| Any single character (including spaces, but not newline) <br> <code>.</code>
| <span style="background:#C6E3FF">W</span>hat Does the Fox Say? 12 狐狸怎叫 34
|
|
|-
| Any character (including spaces), appears 1 or 0 times <br> <code>.?</code> = <code>.{0,1}</code>
| <span style="background:#C6E3FF">W</span>hat Does the Fox Say? 12 狐狸怎叫 34
|
|
|-
| Any number of multiple characters (including spaces) <br> <code>.*</code> = <code>.{0,}</code>
| <span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span>
|
|
|-
| Any number of characters (including spaces), at least 1 occurrence <br> <code>.+</code> = <code>.{1,}</code>
| <span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span>
|
|
|-
| Any number of spaces or newlines (at least 1 occurrence) <br> <code>\s+</code>
| What<span style="background:#C6E3FF"> </span>Does the Fox Say? 12 狐狸怎叫 34
| Any number of characters (not including spaces or newlines) <br> <code>[^\s]+</code> = <code>[^\s]{1,}</code> = <code>[\S]+</code> = <code>[^ ]+</code>
| <span style="background:#C6E3FF">What</span> Does the Fox Say? 12 狐狸怎叫 34
|-
| Any number of ASCII characters (including English, numbers and spaces) <br> <code>[\x00-\x80]+</code> or <code>[[:ascii:]]+</code>
| <span style="background:#C6E3FF">What Does the Fox Say? 12</span> 狐狸怎叫 34
| Non-ASCII, i.e., Chinese characters appearing any number of times <br> <code>[^\x00-\x80]+</code>
| What Does the Fox Say? 12 <span style="background:#C6E3FF">狐狸怎叫</span> 34
|-
| Any number of uppercase/lowercase English letters, numbers and underscore (_) (not including spaces) <br> <code>[\w]+</code> = <code>[a-zA-Z0-9_]+</code> <br> PHP with <code>u</code> modifier supports Chinese characters
| <span style="background:#C6E3FF">What</span> <span style="background:#C6E3FF">Does</span> <span style="background:#C6E3FF">the</span> <span style="background:#C6E3FF">Fox</span> <span style="background:#C6E3FF">Say</span>? <span style="background:#C6E3FF">12</span> 狐狸怎叫 <span style="background:#C6E3FF">_34</span>
| Any number of characters that are not English letters, numbers and underscore (_) <br> <code>\W+</code> = <code>[^a-zA-Z0-9_]+</code>
|
|-
| Any number of digits (not including spaces) <br> <code>[\d]+</code> = <code>[0-9]+</code>
| What Does the Fox Say? <span style="background:#C6E3FF">12</span> 狐狸怎叫 34
| Any number of characters not including digits (including spaces) <br> <code>[^\d]+</code> = <code>[^0-9]+</code> = <code>\D+</code>
| <span style="background:#C6E3FF">What Does the Fox Say? </span>12 狐狸怎叫 34
|-
| Any number of Chinese characters <br> <code>[\p{Han}]+</code>
| What Does the Fox Say? 12 <span style="background:#C6E3FF">狐狸怎叫</span> 34
| Any number of characters not including Chinese <br> <code>[^\p{Han}]+</code>
|
|-
| Lines starting with “狐狸” <br> <code>^狐狸.*$</code>
| <span style="background:#C6E3FF">狐狸怎叫 34 What Does the Fox Say?</span><br>柴犬怎叫 What Does the shiba inu say?
| Lines not starting with “狐狸” <br> <code>^(?!狐狸).*$</code>
| 狐狸怎叫 34 What Does the Fox Say?<br><span style="background:#C6E3FF">柴犬怎叫 What Does the shiba inu say?</span>
|-
| Lines ending with “怎叫” <br> <code>^.*怎叫$</code>
| What Does the Fox Say? 12 狐狸怎叫 34<br><span style="background:#C6E3FF">What Does the shiba inu say? 柴犬怎叫</span>
| Lines not ending with “怎叫” <br> <code>.*(?&lt;!怎叫)$</code>
| <span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br>What Does the shiba inu say? 柴犬怎叫
|-
| Lines containing “狐狸” <br> <code>^.*狐狸.*$</code> or <code>(狐狸)</code>
| <span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br>What Does the shiba inu say? 柴犬怎叫
| Lines not containing “狐狸” <br> <code>^((?!狐狸).)*$</code>
| What Does the Fox Say? 12 狐狸怎叫 34<br><span style="background:#C6E3FF">What Does the shiba inu say? 柴犬怎叫</span>
|-
| Boolean logic AND: Lines containing both “狐狸” and “叫” <br> <code>(?=.*狐狸)(?=.*叫).*</code> or <code>狐狸.*叫\|叫.*狐狸</code>
| <span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br><span style="background:#C6E3FF">What Does the Fox Say? 12 不叫狐狸 34</span><br>What Does the shiba inu say? 柴犬怎叫
|
|
|-
| Boolean logic OR: Lines containing “狐狸” or “叫” <br> <code>.*(狐狸\|叫).*</code>
| <span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34<br>What Does the shiba inu say? 柴犬怎叫</span><br>What Does the shiba inu say? 柴犬怎了
| Boolean logic: Lines not containing “狐狸” and not containing “柴犬” <br> <code>^((?!狐狸\|柴犬).)*$</code>
| What Does the Fox Say? 12 狐狸怎叫 34<br>What Does the shiba inu say? 柴犬怎叫<br><span style="background:#C6E3FF">What Does the Husky say? 哈士奇怎叫</span>
|-
| Boolean logic NOT: Lines not containing “狐狸” but containing “柴犬” <br> <code>^((?!狐狸).)*(柴犬).*$</code> = <code>^(柴犬).*((?!狐狸).)*$</code> = <code>(柴犬).*((?!狐狸).)*</code>
| What Does the Fox Say? 12 狐狸怎叫 34<br><span style="background:#C6E3FF">What Does the shiba inu say? 柴犬怎叫</span>
|
|
|}


改成
</pre>


==== 方案1: Sublime Text, EmEditor ====
== Regular Expression Online Tools ==
語法適用 [http://www.sublimetext.com/ Sublime Text], [http://www.emeditor.com/ EmEditor]軟體 (以下為 EmEditor 的操作說明)
# Menu: Search -> Replace
# click "Use Regular Expression"
## Find: {{kbd | key = <nowiki>\n</nowiki>}} ([https://zh.wikipedia.org/wiki/%E6%8F%9B%E8%A1%8C 換行符號] 。{{Win}} 作業系統的換行符號是 {{kbd | key = <nowiki>\r\n</nowiki>}}、{{Mac}} 作業系統的換行符號是 {{kbd | key = <nowiki>\n</nowiki>}},取兩者共有的符號。如果使用 {{Linux}} 作業系統的換行符號是 {{kbd | key = <nowiki>\r</nowiki>}}。 )
## Replace with: {{kbd | key = <nowiki>, </nowiki>}}
# click "Replace all"


==== 方案2: Notepad++ ====
Websites for testing regular expression syntax:  
使用[http://notepad-plus-plus.org/ Notepad++]軟體
* {{Gd}} [http://regex101.com/ RegEx101] - “Online regex tester and debugger: PHP, PCRE, Python, Golang and JavaScript” - Provides syntax explanations
# 選單: 尋找 -> 取代
* {{Gd}} [http://gskinner.com/RegExr/ RegExr] - Learn, Build, &amp; Test RegEx - Provides syntax explanations
# 搜尋模式: 勾選「增強模式」 (不是勾選「用類型表式」)
* [https://regexper.com/ Regexper] - Visual explanation of syntax using diagrams
## 尋找目標: {{kbd | key = <nowiki>\n</nowiki>}} (換行符號)
* [https://jex.im/regulex/ Regulex:JavaScript Regular Expression Visualizer] - JavaScript Regular Expression Visualizer - Visual explanation using diagrams
## 取代成: {{kbd | key = <nowiki>, </nowiki>}}
* [http://www.rubular.com/ Rubular] - A Ruby regular expression editor and tester
# 勾選全部取代
* [http://www.phpliveregex.com/ PHP Live Regex]
* [http://www.regextester.com/ Regex Tester and Debugger Online] - JavaScript, PCRE, PHP


相關資料: [http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Replacing_Newlines How To Replace Line Ends, thus changing the line layout] last visited: 2010-01-27


==== 方案3: Microsoft Word ====
使用Microsoft Word 2002軟體
# 選單: 編輯 -> 取代
# 勾選增強模式
## 尋找目標: {{kbd | key = <nowiki>^p</nowiki>}} (段落標記)
## 取代為: {{kbd | key = <nowiki>, </nowiki>}}
# 勾選全部取代


==== 方案4: Sed command for linux ====
== Common Use Cases ==


{{kbd | key=<nowiki>sed 's/要被取代的字串/新的字串/g' old.filename > new.filename</nowiki>}}<ref>[http://linux.vbird.org/linux_basic/0330regularex.php#sed_replace 鳥哥的 Linux 私房菜 -- 正規表示法 (regular expression, RE) 與文件格式化處理]</ref>


(1)要被取代的字串: :a;N;$!ba;s/\n
=== Replace Newlines with Commas ===
(2)新的字串: ;


{{kbd | key=<nowiki>sed ':a;N;$!ba;s/\n/; /g' old.filename > new.filename</nowiki>}} <ref>參考 [http://stackoverflow.com/questions/1251999/sed-how-can-i-replace-a-newline-n unix - sed: How can I replace a newline? ]</ref>
Converting email lists into a format usable by email software:


<pre>Original:


Convert to:


<div style="float: left; width: 100%; position: relative; display: block; clear: left;">
==== Method 1: Sublime Text, EmEditor ====
<div style="width: 46%;  float: left; margin:0 auto; position: relative; display: block; ">
==== 將每行的文字,移除換行,並且都加上逗號分隔 ====
<pre>
// before
Elmo
Emie
Granny Bird


// after
# Menu: Search -&gt; Replace
Elmo, Emie, Granny Bird
# Check “Use Regular Expression”
</pre>
#* Find: <code>\n</code> (newline character)
方法: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor]。
#* Replace with: <code>,</code>
* Find what: {{kbd | key = <nowiki>\n</nowiki>}}
# Click “Replace all”
* Replace with: {{kbd | key = <nowiki>, </nowiki>}} 此例是將每行的文字,都加上逗號+空格分隔 (如果要用別的符號分隔,例如頓號分隔,則是 Replace with: {{kbd | key = <nowiki>、</nowiki>}})


</div>


==== Method 2: Notepad++ ====


<div style="width: 46%; float: left; margin:0 auto; position: absolute; display: block; left: 54%; top: 0;">
# Menu: Find -&gt; Replace
# Search mode: Check “Extended mode” (not “Regular expression”)
#* Find: <code>\n</code>
#* Replace with: <code>,</code>
# Click “Replace All”


==== 將逗號分隔的文字,還原成逐行顯示,並且移除分隔符號 (,) ====
<pre>
// before
Elmo, Emie, Granny Bird


// after
==== Method 3: Microsoft Word ====
Elmo
Emie
Granny Bird
</pre>
方法: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor]。{{exclaim}} 輸出結果的每行前面可能會有空白
* Find what: {{kbd | key = <nowiki>([^,]+),</nowiki>}}
* Replace with: {{kbd | key = <nowiki>\1\n</nowiki>}}


</div>
# Menu: Edit -&gt; Replace
</div>
# Check extended mode
#* Find: <code>^p</code> (paragraph mark)
#* Replace with: <code>,</code>
# Click “Replace All”


<div style="clear:both;">&nbsp;</div>
==== Method 4: Sed command for Linux ====


=== Find IP address ===
<syntaxhighlight lang="bash">sed ':a;N;$!ba;s/\n/; /g' old.filename > new.filename</syntaxhighlight>
使用[http://notepad-plus-plus.org/ Notepad++]軟體 v.5.9.5
# 選單: 尋找 -> 取代
# 搜尋模式: 勾選「用類型表式」
## 尋找目標: \d\d?\d?\.\d\d?\d?\.\d\d?\d?\.\d\d?\d?


note: not support {n} syntax
=== Find IP Addresses (IPv4) ===


參考資料:
For Notepad++ v.5.9.5: - Find: <code>\d\d?\d?\.\d\d?\d?\.\d\d?\d?\.\d\d?\d?</code>
* [http://sourceforge.net/projects/notepad-plus/forums/forum/331754/topic/4780602 SourceForge.net: Notepad++: Regular expression for IP addresses]
* [http://stackoverflow.com/questions/53497/regular-expression-that-matches-valid-ipv6-addresses regex - Regular expression that matches valid IPv6 addresses - Stack Overflow] {{access | date = 2015-08-10}}


=== 移除記事本純文字檔的黑色方塊(UNIX系統的換行符號 LF ) ===
For Sublime Text v. 3.2.21: - Find: <code>(?:\d{1,3}\.){3}\d{1,3}</code>
使用notepad++軟體
# 選單: 尋找 -> 取代
# 搜尋模式: 勾選「增強模式」
## 尋找目標: \n\n  (註: 2個LF )
## 取代成: \r\n  (註: CR與LF )


用記事本打開純文字檔時,就不會看到黑色方塊
=== Remove Black Squares (UNIX Line Endings LF) ===


Using Notepad++: 1. Menu: Find -&gt; Replace 2. Search mode: Check “Extended mode” - Find: <code>\n\n</code> (2 LF characters) - Replace with: <code>\r\n</code> (CR and LF)


=== 將每項元素,加上引號框起來 ===
=== Add Quotes Around Elements ===
==== 將陣列的每項元素,都加上引號框起來 ====
<pre>
Elmo, Emie, Granny Bird, Herry Monster, 喀喀獸
修改成
'Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸'
</pre>
方法1: 使用 PHP
{{exclaim}} 如果元素包含換行符號,不能用下面方法處理。
<pre>
$users = array('Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸');
//「單引號」相隔每個元素
$result = implode(",", preg_replace('/^(.*?)$/', "'$1'", $users));


//「雙引號」相隔每個元素
==== Add Quotes Around Array Elements ====
$result = implode(",", preg_replace('/^(.*?)$/', "\"$1\"", $users));
echo $result;
</pre>


Thanks, Joshua! More on [http://melikedev.com/2010/02/24/php-wrap-implode-array-elements-in-quotes/ PHP - Wrap Implode Array Elements in Quotes » Me Like Dev]
<pre>Before: Elmo, Emie, Granny Bird, Herry Monster, 喀喀獸
After: 'Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸'</pre>
'''Method 1: PHP'''


方法2: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor]
<syntaxhighlight lang="php">$users = array('Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸');
* Find: {{kbd | key = <nowiki>([^\s|,]+)</nowiki>}}
// Single quotes around each element
* 分隔符號
$result = implode(",", preg_replace('/^(.*?)$/', "'$1'", $users));
**「單引號」相隔每個元素 Replace with: {{kbd | key = <nowiki>'\1'</nowiki>}}
// Double quotes around each element
**「雙引號」相隔每個元素 Replace with: {{kbd | key = <nowiki>"\1"</nowiki>}}
$result = implode(",", preg_replace('/^(.*?)$/', "\"$1\"", $users));
echo $result;</syntaxhighlight>
'''Method 2: Sublime Text or EmEditor''' - Find: <code>([^\s|,]+)</code> - Replace with: <code>'\1'</code> (for single quotes) or <code>&quot;\1&quot;</code> (for double quotes)


方法3: 使用 [https://notepad-plus-plus.org/ Notepad++]。啟用搜尋模式的「用類型表式」
'''Method 3: Notepad++''' (Enable “Regular expression” search mode) - Find: <code>([^\s|,]+)</code> - Replace with: <code>'$1'</code> (for single quotes) or <code>&quot;$1&quot;</code> (for double quotes)
* Find: {{kbd | key = <nowiki>([^\s|,]+)</nowiki>}}
* 分隔符號
**「單引號」相隔每個元素 Replace with: {{kbd | key = <nowiki>'$1'</nowiki>}}
**「雙引號」相隔每個元素 Replace with: {{kbd | key = <nowiki>"$1"</nowiki>}}


=== Find Non-ASCII Characters (Chinese/Non-English Text) ===


<div style="float: left; width: 100%; position: relative; display: block; clear: left;">
<div style="width: 46%;  float: left; margin:0 auto; position: relative; display: block; ">


==== 將每行的文字,都加上引號框起來,並且移除換行 ====
==== In LibreOffice ====
<pre>
// before
Elmo
Emie
Granny Bird


// after
<pre>[^\u0000-\u0080]+</pre>
'Elmo', 'Emie', 'Granny Bird'
</pre>
方法1: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor]。該方法有處理每行的前面或後面可能有一格或多格空白
* Find what: {{kbd | key = <nowiki>(\S+)(\s?)+$\n</nowiki>}}
* Replace with: {{kbd | key = <nowiki>'\1', </nowiki>}}


方法2: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor] {{exclaim}} 該方法沒有處理每行的後面可能有一格或多格空白
* Find what: {{kbd | key = <nowiki>(.*)$\n</nowiki>}} 或 {{kbd | key = <nowiki>(\S+)$\n</nowiki>}} 或 {{kbd | key = <nowiki>(\S+)\n</nowiki>}}
* Replace with: {{kbd | key = <nowiki>'\1', </nowiki>}}


</div>
==== Find Chinese Characters in Google Sheets ====


Example: If cell {{kbd | key=A2}} contains any Chinese character, display “Chinese”, otherwise display “English”:


<div style="width: 46%; float: left; margin:0 auto; position: absolute; display: block; left: 54%; top: 0;">
<pre>=IF(REGEXMATCH(A2, &quot;[\一-\龥]&quot;), &quot;Chinese&quot;, &quot;English&quot;)</pre>


==== 將引號框起來的文字,還原成逐行顯示,並且移除分隔符號 (,) ====
==== Find Non-ASCII Characters in Google Sheets ====
Extract non-ASCII characters (such as Chinese, Japanese, emoji, etc.) from cell {{kbd | key=A2}}
<pre>
<pre>
// before
=IF(ISERROR(REGEXEXTRACT(A2, "[^\x00-\x80]+")), "", REGEXEXTRACT(A2, "[^\x00-\x80]+"))
'Elmo', 'Emie', 'Granny Bird'
</pre>


// after
Explanation of regular expression {{kbd | key=<nowiki>[^\x00-\x80]+</nowiki>}}
Elmo
Emie
Granny Bird
</pre>
方法: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor]。該方法有處理每行的前面或後面可能有一格或多格空白
* Find what: {{kbd | key = <nowiki>'(([^,|^'])+)',?\s?</nowiki>}}
* Replace with: {{kbd | key = <nowiki>\1\n</nowiki>}}


</div>
* {{kbd | key=<nowiki>[\x00-\x80]</nowiki>}}: Represents the ASCII character range (character codes 0-128). (1) Standard ASCII range: 0-127 ({{kbd | key=<nowiki>0x00-0x7F</nowiki>}} aka * {{kbd | key=<nowiki>[\x00-\x7F]</nowiki>}})<ref>[https://www.commfront.com/pages/ascii-chart ASCII Chart – CommFront]</ref> (2) Character 128 (({{kbd | key=<nowiki>0x80</nowiki>}}) is actually the first character in the extended ASCII range, not part of the original ASCII standard.<ref>[https://en.wikipedia.org/wiki/UTF-8 UTF-8 - Wikipedia]</ref><ref>[https://en.wikipedia.org/wiki/Control_character Control character - Wikipedia]</ref>
</div>
* {{kbd | key=<nowiki>[^...]</nowiki>}}: Means "not" these characters
* {{kbd | key=<nowiki>+</nowiki>}}: Means one or more


<div style="clear:both;">&nbsp;</div>
Overall meaning: Matches one or more non-ASCII characters


=== 尋找非英文的文字 ===
==== Find Chinese Characters in MySQL ====
適用: Google Drive 的 RegExReplace 函數、Notepad++的搜尋
<pre>
[^\x00-\x80]+
</pre>


適用: Total commander 的 Multi-Rename tool<ref>取代非英文的文字,但是不包含 . 符號: <nowiki>[^\u0000-\u0080|.]+ </nowiki></ref>
Find rows where <code>column_name</code> contains Chinese characters:
<pre>
[^\u0000-\u0080]+
</pre>


適用: MySQL<ref>[https://stackoverflow.com/questions/9795137/how-to-detect-rows-with-chinese-characters-in-mysql How to detect rows with chinese characters in MySQL? - Stack Overflow]</ref>
<pre lang="sql">SELECT `column_name`
<pre>
# Table `table_name` contains the Chinese characters
SELECT `column_name`
FROM `table_name`
FROM `table_name`
WHERE HEX(`column_name`) REGEXP '^(..)*(E[4-9])';
WHERE HEX(`column_name`) REGEXP '^(..)*(E[4-9])';</pre>
</pre>


參考資料:
Query condition used to match records where the <code>column_name</code> field contains only Chinese characters.
* [http://stackoverflow.com/questions/150033/regular-expression-to-match-non-english-characters javascript - Regular expression to match non-english characters? - Stack Overflow]
<pre lang="sql">SELECT `column_name`
FROM `table_name`
WHERE `column_name` REGEXP '^[一-]+$';</pre>


=== 尋找中文的文字 ===
Explanation:
中文字包含繁體中文與簡體中文,不包含特殊符號,例如 Emoji:{{kbd | key = }}
* {{kbd | key=<nowiki>[一-龯]</nowiki>}} - Character set that matches all characters from "一" to "龯" in Unicode
PHP:
* "一" has Unicode code point {{kbd | key=<nowiki>U+4E00</nowiki>}}<ref>[https://www.compart.com/en/unicode/U+4E00 “一” U+4E00 CJK Unified Ideograph-4E00 Unicode Character]</ref>
<pre>
* "" has Unicode code point {{kbd | key=<nowiki>U+9FEF</nowiki>}}<ref>[https://www.compart.com/en/unicode/U+9FAF “龯” U+9FAF CJK Unified Ideograph-9FAF Unicode Character]</ref>
// approach 1
* This range U+4E00-U+9FFF already covers over 99% of daily Chinese usage requirements [https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_Extension_B Extension B] and later blocks mainly contain ancient Chinese characters, variant characters, etc., which rarely appear in modern texts
if (preg_match('/^[\x{4e00}-\x{9fa5}]+$/u', $string)) {
echo "全部文字都是中文字" . PHP_EOL;
}else{
echo "部分文字不是中文字" . PHP_EOL;
}


// approach 2
==== Find Non-ASCII Characters in MySQL ====
if (preg_match('/^[\p{Han}]+$/u', $string)) {
echo "全部文字都是中文字" . PHP_EOL;
}else{
echo "部分文字不是中文字" . PHP_EOL;
}
</pre>


技術問題除錯:
Find rows where <code>column_name</code> is not entirely ASCII characters:
* 錯誤訊息:<pre>preg_match(): Compilation failed: character value in \x{} or \o{} is too large at offset 8</pre>


解決方式: [http://php.net/manual/en/function.preg-match.php preg_match()] 需要加上 {{kbd | key = u }} 變數<ref>[https://stackoverflow.com/questions/32375531/preg-match-compilation-failed-character-value-in-x-or-o-is-too-large-a php - preg_match(): Compilation failed: character value in \x{} or \o{} is too large at offset 27 on line number 25 - Stack Overflow]</ref>
<syntaxhighlight lang="sql">SELECT `column_name`
FROM `table_name`
WHERE `column_name` <> CONVERT(`column_name` USING ASCII)</syntaxhighlight>


參考資料:
==== Find Chinese Characters in PHP ====
* [http://blog.csdn.net/tinyletero/article/details/8201465 unicode编码 \u4e00-\u9fa5 匹配所有中文 - CSDN博客]
* [https://stackoverflow.com/questions/38168419/codeigniter-form-validation-for-chinese-words php - CodeIgniter Form Validation for Chinese Words - Stack Overflow]


=== 將每行文字的行頭加上逗號符號 ===
'''Exact match:'''
使用notepad++軟體
# 選單: 尋找 -> 取代
# 搜尋模式: 勾選「用類型表示」
## 尋找目標: {{kbd | key=(.*)}} 或者是 {{kbd | key=^(.*)$}}
## 取代成: {{kbd | key=,\1}} 或者是 {{kbd | key=,$1}}。


參考資料: [http://stackoverflow.com/questions/8413237/notepad-regex-search-replace-how-to-append-and-prepend-a-character-at-start-a Notepad++ RegEx Search/Replace: How to append and prepend a character at start and end of each file line? - Stack Overflow]
<syntaxhighlight lang="php">// Approach 1
if (preg_match('/^[\x{4e00}-\x{9fa5}]+$/u', $string)) {
    echo "All text is Chinese characters" . PHP_EOL;
} else {
    echo "Some text is not Chinese characters" . PHP_EOL;
}


=== 知道前面跟後面的文字,但是中間文字忘記了 ===
// Approach 2
使用notepad++軟體
if (preg_match('/^[\p{Han}]+$/u', $string)) {
# 選單: 尋找 -> 取代
    echo "All text is Chinese characters" . PHP_EOL;
# 搜尋模式: 勾選「用類型表示」
} else {
## 尋找目標: {{kbd | key=a(.*)le}} 就可以找到(1)apple (2)apps lesson ... 等a開頭、le結尾的文字,中間可夾雜空白。 {{exclaim}} 中文字串搜尋,建議將文件的編碼改成 UTF-8 編碼
    echo "Some text is not Chinese characters" . PHP_EOL;
}</syntaxhighlight>
'''Partial match:'''


<syntaxhighlight lang="php">// Approach 1
$string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐';
$pattern = '/[\p{Han}]+/u';
preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE);
var_dump($matches);


=== 移除空白行 ===
// Approach 2
<pre>
$string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐';
# () 每行可能間隔一行空白或多行空白
$pattern = '/[\x{4e00}-\x{9fa5}]+/u';
尼歐
preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE);
崔妮蒂
var_dump($matches);</syntaxhighlight>


莫斐斯
=== Find ASCII Characters in PHP ===


'''Code I:'''


史密斯
<syntaxhighlight lang="php">if (preg_match('/[^\x20-\x7f]/', $keyword) === 0) {
祭師
    echo "The keyword is ASCII only";
} else {
    echo "The keyword contains non-ASCII characters (like Chinese, Japanese, etc.)";
}</syntaxhighlight>
'''Code II:'''


# () 改成每行逐行緊接著
<syntaxhighlight lang="php">$pattern = '/^[[:ascii:]]+$/i';
尼歐
$text = "Hello World"; // ASCII only
崔妮蒂
if (preg_match($pattern, $text)) {
莫斐斯
    echo "Pure ASCII characters";
史密斯
} else {
祭師
    echo "Contains non-ASCII characters";
</pre>
}</syntaxhighlight>


移除一行空白或多行空白( 行內可能包含一個或多個空白字元 {{kbd | key= SPACE}} 、定位鍵{{kbd | key= TAB}})
=== Remove Empty Lines ===
* 使用工具:  適用 Sublime Text 與 EmEditor 軟體,需勾選「使用規則運算式」。{{exclaim}} 以下語法不適用於 Notepad++ 軟體<ref>[http://www.sitepoint.com/forums/showthread.php?448843-Regex-delete-multiple-blank-lines Regex: delete multiple blank lines]</ref>
** 尋找: {{kbd | key=<nowiki>^[\s\t]*$\n</nowiki>}} --> 取代為: 空 (不需要輸入任何字)
* 使用工具: Notepad++
** Notepad++ 軟體選單: 編輯 -> 行列 -> 移除空行(含空白字元 {{kbd | key= SPACE}} )<ref>[http://stackoverflow.com/questions/3866034/removing-empty-lines-in-notepad regex - Removing empty lines in Notepad++ - Stack Overflow]</ref>
* 詳細說明,請見 [[Regular replace blank lines]]


=== 尋找非空白的文字 ===
'''Original:'''
* 尋找: {{kbd | key=<nowiki>[^\s]+</nowiki>}} [https://regex101.com/r/zH7wV3/1 online demo]


=== 將特定符號相隔的文字,改成逐行顯示 ===
<pre>Neo
例子:
Trinity
<pre>
# (原) 頓號(、)符號相隔的文字
尼歐、莫斐斯、崔妮蒂、史密斯、祭師


# (後) 改成逐行顯示
Morpheus
尼歐
莫斐斯
崔妮蒂
史密斯
祭師
</pre>


使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor]
* Find: {{kbd | key = <nowiki>([^、]+)([、]{1})</nowiki>}}
* Replace with: {{kbd | key = <nowiki>\1\n</nowiki>}}


語法說明
Smith
* <nowiki>[^、]</nowiki> : 符合任意字,但不是頓號(、)的文字
Oracle</pre>
* <nowiki>[^、]+</nowiki> : 一次以上不是頓號(、)的文字
'''After:'''
* <nowiki>([^、]+)</nowiki> : 符合「一次以上不是頓號(、)的文字」規則的文字
* <nowiki>[、]</nowiki>: 出現頓號(、)任意次的文字
* <nowiki>[、]{1}</nowiki> : 出現頓號(、)一次的文字
* <nowiki>([、]{1})</nowiki> : 符合「出現頓號(、)一次的文字」規則的文字


<pre>Neo
Trinity
Morpheus
Smith
Oracle</pre>
'''Using Sublime Text &amp; EmEditor:''' - Find: <code>^[\s\t]*$\n</code> - Replace with: (empty)


=== 將每行文字的結尾處,加入空一格 (半形空白) ===
'''Using Notepad++ v7.8.7:''' - Menu: Edit -&gt; Line Operations -&gt; Remove Empty Lines (Including Blank Lines)
法1: 適用軟體: Sublime Text, EmEditor
# Menu: Search -> Replace
# click "Use Regular Expression"
## Find: {{kbd | key = <nowiki>\n</nowiki>}}
## Replace with: {{kbd | key = <nowiki>_\n</nowiki>}}(符號 {{kbd | key = <nowiki>\n</nowiki>}} 前面的 _ 自行替換成半形空白)
# click "Replace all"


=== Find Non-Whitespace Text ===


法2: 適用軟體: Sublime Text, EmEditor
* Find: <code>[^\s]+</code>
# Menu: Search -> Replace
# click "Use Regular Expression"
## Find: {{kbd | key = <nowiki>$</nowiki>}}
## Replace with: {{kbd | key = <nowiki>_$</nowiki>}}(符號 {{kbd | key = <nowiki>$</nowiki>}} 前面的 _ 自行替換成半形空白)
# click "Replace all"


=== Convert Symbol-Separated Text to Line-by-Line Display ===


{{exclaim}} 需要檢查最後一行是否是空白行,如果不是空白行,不會套用到該取代規則
'''Example:'''


=== 將每行文字內夾雜的空白,取代成 Tab 符號 ===
<pre>Before: 尼歐、莫斐斯、崔妮蒂、史密斯、祭師
將原本空白間隔的欄位值,取代成 Tab鍵間隔的欄位值。輸出結果可以方便貼到 MS Excel 或 [[Google spreadsheet]]。
After:
<pre># \t 代表是 Tab 鍵,又稱定位鍵
尼歐
# before
莫斐斯
aaa bbb    ccc
崔妮蒂
史密斯
祭師</pre>
'''Using Sublime Text or EmEditor:''' - Find: <code>([^、]+)([]{1})</code> - Replace with: <code>\1\n</code>


# after
=== Replace Multiple Spaces with Tab Characters ===
aaa\tbbb\tccc
</pre>


說明: \S 代表非空白字元, \r\n 代表換行符號。[^\S\r\n] 則代表不是非空白字元、也不是換行符號。換句話說尋找空白,但不包含換行符號。
'''Before:''' <code>aaa bbb    ccc</code> '''After:''' <code>aaa\tbbb\tccc</code>


使用  Sublime Text 軟體 (參考資料<ref>[http://www.techrepublic.com/blog/microsoft-office/quickly-replace-multiple-space-characters-with-a-tab-character/ Quickly replace multiple space characters with a tab character - TechRepublic]</ref> <ref>[http://stackoverflow.com/questions/3469080/match-whitespace-but-not-newlines-perl regex - Match whitespace but not newlines (Perl) - Stack Overflow]</ref>)
'''Using Sublime Text:''' - Find: <code>([^\S\n]+)</code> or <code>([^\S\r\n]+)</code> or <code>\s\s+</code> - Replace with: <code>\t</code>
# Menu: Search -> Replace
# click "Use Regular Expression"
## Find: {{kbd | key = <nowiki>([^\S\n]+)</nowiki>}} 或 {{kbd | key = <nowiki>([^\S\r\n]+)</nowiki>}} 或 {{kbd | key = <nowiki>_{1,}</nowiki>}} ( 自行替換 _ 成半形空白)
## Replace with: {{kbd | key = <nowiki>\t</nowiki>}}
# click "Replace all"


=== 移除每行文字前後面可能多個的空白 ===
==== 移除每行文字最前面可能多個的空白 ====
* 尋找: {{kbd | key = <nowiki>^\s+</nowiki>}} --> 取代為: 空白 (適用軟體: Sublime Text、EmEditor,需啟用 "Use Regular Expression" )


<pre># before
=== Remove Leading/Trailing Whitespace ===
aaa
bbb
    ccc


# after
aaa
bbb
ccc
</pre>


==== Remove Leading Whitespace ====


==== 移除每行文字最後面可能多個的空白 ====
* Find: <code>^\s+</code>
* 尋找: {{kbd | key = <nowiki>\s+$</nowiki>}} --> 取代為: 空白 (適用軟體: Sublime Text、EmEditor,需啟用 "Use Regular Expression" )
* Replace with: (empty)




==== 移除每行文字前面或後面可能多個的空白 ====
==== Remove Trailing Whitespace ====
* 尋找: {{kbd | key = <nowiki>(^\s+|\s+$)</nowiki>}} --> 取代為: 空白 (適用軟體: Sublime Text、EmEditor,需啟用 "Use Regular Expression" )


=== 尋找包含不是數字,是文字的行 ===
* Find: <code>\s+$</code>
預期每行資料都是數字,尋找包含不是數字,是文字的行
* Replace with: (empty)
<pre>
[^\d|\n]
</pre>


=== 尋找 Hashtag ===
[[Extract all hashtags from text]]


=== 尋找文章內容中的網址 ===
==== Remove Both Leading and Trailing Whitespace ====
[[Regular extract url from text]]


=== 尋找文章內容中的長數字 ===
* Find: <code>(^\s+|\s+$)</code>
[[Extract large number from text]]
* Replace with: (empty)


== Search unmatched string ==
=== case: find un-commented console.log ===
original format: some lines contains un-commented [[Javascript debug]] information
<pre>
  console.log("un-commented debug information");


  //console.log("commented debug information");
== Text Editors Supporting Regular Expressions ==
</pre>


Search pattern: find not started with the / symbol before the string "console.log"
Various text editors support regular expressions including: - Sublime Text - EmEditor - Notepad++ - Visual Studio Code - Atom - Vim/Neovim


<pre>
  [^/](console\.log)
</pre>


== Regular expression batch tools ==
== Syntax Reference ==
'''multiple''' regular expression operations on the same file
* {{Gd}} [https://github.com/facelessuser/RegReplace RegReplace] 執行多個取代命令 "Simple find and replace sequencer plugin for Sublime Text" Quoted from official webpage. {{access | date=2014-10-25}}
* ''$'' [https://www.emeditor.com/text-editor-features/more-features/batch-replace/ EmEditor (Text Editor) - Batch Replace] & [https://zh-tw.emeditor.com/text-editor-features/coding/regular-expressions/ EmEditor (文字編輯器) | 規則運算式]


one regular expression operations on '''multiple''' files
* Newline character: <code>\r\n</code> (for Notepad++: Extended mode &amp; Regular expression mode)
* ''$''  [https://www.emeditor.com/text-editor-features/more-features/find-replace/ EmEditor (Text Editor) | Find and Replace]
* Tab character: <code>\t</code> (for Notepad++: Extended mode)
* Digits: <code>\d</code> (for Notepad++: Regular expression mode only)
* Non-whitespace: <code>\S</code> - Does not include half-width spaces and full-width spaces


== syntax ==
== Troubleshooting Regular Expressions ==
* 換行符號: \r\n (適用: Notepad++選項: 增強模式 & 用類型表式)
* tab鍵的固定空白分隔: \t  (適用: Notepad++選項: 增強模式)
* 數字: \d (適用: Notepad++選項: 用類型表式。{{exclaim}} 不適用: Notepad++選項: 增強模式)
* {{kbd | key=<nowiki>\S</nowiki>}} 非空白的文字: 不會含括半形空白與全行空白


== Troubleshooting of regular expression ==
'''Tips:''' 1. Use online tools like regex101 to understand your syntax 2. Test with small data: Prepare small file data to verify syntax 3. Highlight or output matched text for debugging 4. Simplify the syntax when encountering issues 5. Try alternative syntax due to compatibility issues (e.g., <code>\d</code> to <code>[0-9]+</code>)
Tips
* Small data test: (1) Prepare the small file data to verify the syntax (2) Using the [[Regular_expression#Regular_expression_online_tools | online tools]]
* Highlight or output the matched text e.g. {{kbd | key=<nowiki>--color</nowiki>}}<ref>[https://www.cyberciti.biz/faq/howto-use-grep-command-in-linux-unix/grep_command_examples/ Grep -color command Examples - nixCraft]</ref> for grep command or output the matches by PHP [http://php.net/manual/en/function.preg-match.php preg_match()] function.
* Simplify the syntax
* Because the compatibility issue, you may try to use the alternative syntax e.g. {{kbd | key=<nowiki>\d</nowiki>}} to {{kbd | key=<nowiki>[0-9]+</nowiki>}}.


Related articles
* [http://errerrors.blogspot.com/2015/07/sublime-text-invalid-lookbehind.html Err: 解決 Sublime Text 正則表示式搜尋,遇到的「Invalid lookbehind assertion」錯誤]


== further reading ==
== Alternative Solutions ==
* [http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Searching_And_Replacing SourceForge.net: Searching And Replacing - notepad-plus], [http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Regular_Expressions SourceForge.net: Regular Expressions - notepad-plus]
* [http://stackoverflow.com/questions/23020856/text-extraction-with-sublime-text regex - text extraction with sublime text - Stack Overflow] {{access | date=2014-09-26}}
* [https://zh.wikipedia.org/wiki/%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F 正規表示式 - 維基百科,自由的百科全書]
* [http://www.regular-expressions.info/ Regular-Expressions.info - Regex Tutorial, Examples and Reference - Regexp Patterns]
* [http://linux.vbird.org/linux_basic/0320bash.php 鳥哥的 Linux 私房菜 -- 第十章、認識與學習BASH] {{access | date = 2016-06-08}}


unicode
* Use Tab-separated data that can be easily pasted into Google Sheets or MS Excel
* [http://www.regular-expressions.info/unicode.html Regex Tutorial - Unicode Characters and Properties] {{access | date = 2014-04-02}}
* Copy multiple rows and paste between different applications (compatibility varies)
* [http://php.net/manual/en/regexp.reference.unicode.php PHP: Unicode character properties - Manual] {{access | date = 2014-04-02}}


references
== Further Reading ==
<references/>


== 替代方案 ==
* Regular-Expressions.info - Regex Tutorial, Examples and Reference
* 將資料以 {{kbd |key=Tab}}來隔開,貼到Google Drive的Spreadsheet或MS Excel,會自動儲存到不同欄位。所以將需要處理的原始資料中,需要擷取的資料的前後,使用{{kbd |key=Tab}}來隔開,複製後貼到於Google Drive的Spreadsheet或MS Excel,就會自動儲存到不同欄位,方便做進一步處理。
* Unicode character properties documentation
* Platform-specific regular expression documentation


Copy multiple rows & paste
{{Template: Data factory flow}}
* Copy to dreamweaver from MS Excel 2002: ok
* Copy to dreamweaver from Google Docs: not ok {{exclaim}}
* Copy to MS Excel 2002 from Google Docs: ok


[[Category:Regular expression]] [[Category:Software]] [[Category:Programming]] [[Category:Data Science]] [[Category:Search]] [[Category:Text file processing]]
[[Category: Regular expression]]  
[[Category: Software]]  
[[Category: Programming]]  
[[Category: Data Science]]  
[[Category: Search]]  
[[Category: String manipulation]]
[[Category: Revised with LLMs]

Latest revision as of 11:55, 11 December 2025

When processing text files through regular expressions, you can quickly search for or replace strings that match specific rules. Processing is done on a line-by-line basis for string manipulation. Regular expressions are also known as regex, regexp, or pattern matching expressions.

🌐 Switch language: English, 漢字


Raise_hand.png Need Help? You can use the provided explanatory online tools to try debugging yourself.


Quick Reference Table[edit]

Note: (1) Blue highlighted areas in samples represent text matching the rules, (2) The same text rule can have multiple representations

Text Rule Sample Opposite Text Rule Sample
Any single character (including spaces, but not newline)
.
What Does the Fox Say? 12 狐狸怎叫 34
Any character (including spaces), appears 1 or 0 times
.? = .{0,1}
What Does the Fox Say? 12 狐狸怎叫 34
Any number of multiple characters (including spaces)
.* = .{0,}
What Does the Fox Say? 12 狐狸怎叫 34
Any number of characters (including spaces), at least 1 occurrence
.+ = .{1,}
What Does the Fox Say? 12 狐狸怎叫 34
Any number of spaces or newlines (at least 1 occurrence)
\s+
What Does the Fox Say? 12 狐狸怎叫 34 Any number of characters (not including spaces or newlines)
[^\s]+ = [^\s]{1,} = [\S]+ = [^ ]+
What Does the Fox Say? 12 狐狸怎叫 34
Any number of ASCII characters (including English, numbers and spaces)
[\x00-\x80]+ or ascii:+
What Does the Fox Say? 12 狐狸怎叫 34 Non-ASCII, i.e., Chinese characters appearing any number of times
[^\x00-\x80]+
What Does the Fox Say? 12 狐狸怎叫 34
Any number of uppercase/lowercase English letters, numbers and underscore (_) (not including spaces)
[\w]+ = [a-zA-Z0-9_]+
PHP with u modifier supports Chinese characters
What Does the Fox Say? 12 狐狸怎叫 _34 Any number of characters that are not English letters, numbers and underscore (_)
\W+ = [^a-zA-Z0-9_]+
Any number of digits (not including spaces)
[\d]+ = [0-9]+
What Does the Fox Say? 12 狐狸怎叫 34 Any number of characters not including digits (including spaces)
[^\d]+ = [^0-9]+ = \D+
What Does the Fox Say? 12 狐狸怎叫 34
Any number of Chinese characters
[\p{Han}]+
What Does the Fox Say? 12 狐狸怎叫 34 Any number of characters not including Chinese
[^\p{Han}]+
Lines starting with “狐狸”
^狐狸.*$
狐狸怎叫 34 What Does the Fox Say?
柴犬怎叫 What Does the shiba inu say?
Lines not starting with “狐狸”
^(?!狐狸).*$
狐狸怎叫 34 What Does the Fox Say?
柴犬怎叫 What Does the shiba inu say?
Lines ending with “怎叫”
^.*怎叫$
What Does the Fox Say? 12 狐狸怎叫 34
What Does the shiba inu say? 柴犬怎叫
Lines not ending with “怎叫”
.*(?<!怎叫)$
What Does the Fox Say? 12 狐狸怎叫 34
What Does the shiba inu say? 柴犬怎叫
Lines containing “狐狸”
^.*狐狸.*$ or (狐狸)
What Does the Fox Say? 12 狐狸怎叫 34
What Does the shiba inu say? 柴犬怎叫
Lines not containing “狐狸”
^((?!狐狸).)*$
What Does the Fox Say? 12 狐狸怎叫 34
What Does the shiba inu say? 柴犬怎叫
叫.*狐狸 What Does the Fox Say? 12 狐狸怎叫 34
What Does the Fox Say? 12 不叫狐狸 34
What Does the shiba inu say? 柴犬怎叫
叫).* What Does the Fox Say? 12 狐狸怎叫 34
What Does the shiba inu say? 柴犬怎叫

What Does the shiba inu say? 柴犬怎了
柴犬).)*$ What Does the Fox Say? 12 狐狸怎叫 34
What Does the shiba inu say? 柴犬怎叫
What Does the Husky say? 哈士奇怎叫
Boolean logic NOT: Lines not containing “狐狸” but containing “柴犬”
^((?!狐狸).)*(柴犬).*$ = ^(柴犬).*((?!狐狸).)*$ = (柴犬).*((?!狐狸).)*
What Does the Fox Say? 12 狐狸怎叫 34
What Does the shiba inu say? 柴犬怎叫


Regular Expression Online Tools[edit]

Websites for testing regular expression syntax:


Common Use Cases[edit]

Replace Newlines with Commas[edit]

Converting email lists into a format usable by email software:

Original:
[email protected]
[email protected]
[email protected]

Convert to:
[email protected],[email protected],[email protected]

Method 1: Sublime Text, EmEditor[edit]

  1. Menu: Search -> Replace
  2. Check “Use Regular Expression”
    • Find: \n (newline character)
    • Replace with: ,
  3. Click “Replace all”


Method 2: Notepad++[edit]

  1. Menu: Find -> Replace
  2. Search mode: Check “Extended mode” (not “Regular expression”)
    • Find: \n
    • Replace with: ,
  3. Click “Replace All”


Method 3: Microsoft Word[edit]

  1. Menu: Edit -> Replace
  2. Check extended mode
    • Find: ^p (paragraph mark)
    • Replace with: ,
  3. Click “Replace All”

Method 4: Sed command for Linux[edit]

sed ':a;N;$!ba;s/\n/; /g' old.filename > new.filename

Find IP Addresses (IPv4)[edit]

For Notepad++ v.5.9.5: - Find: \d\d?\d?\.\d\d?\d?\.\d\d?\d?\.\d\d?\d?

For Sublime Text v. 3.2.21: - Find: (?:\d{1,3}\.){3}\d{1,3}

Remove Black Squares (UNIX Line Endings LF)[edit]

Using Notepad++: 1. Menu: Find -> Replace 2. Search mode: Check “Extended mode” - Find: \n\n (2 LF characters) - Replace with: \r\n (CR and LF)

Add Quotes Around Elements[edit]

Add Quotes Around Array Elements[edit]

Before: Elmo, Emie, Granny Bird, Herry Monster, 喀喀獸
After: 'Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸'

Method 1: PHP

$users = array('Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸');
// Single quotes around each element
$result = implode(",", preg_replace('/^(.*?)$/', "'$1'", $users));
// Double quotes around each element
$result = implode(",", preg_replace('/^(.*?)$/', "\"$1\"", $users));
echo $result;

Method 2: Sublime Text or EmEditor - Find: ([^\s|,]+) - Replace with: '\1' (for single quotes) or "\1" (for double quotes)

Method 3: Notepad++ (Enable “Regular expression” search mode) - Find: ([^\s|,]+) - Replace with: '$1' (for single quotes) or "$1" (for double quotes)

Find Non-ASCII Characters (Chinese/Non-English Text)[edit]

In LibreOffice[edit]

[^\u0000-\u0080]+


Find Chinese Characters in Google Sheets[edit]

Example: If cell A2 contains any Chinese character, display “Chinese”, otherwise display “English”:

=IF(REGEXMATCH(A2, "[\一-\龥]"), "Chinese", "English")

Find Non-ASCII Characters in Google Sheets[edit]

Extract non-ASCII characters (such as Chinese, Japanese, emoji, etc.) from cell A2

=IF(ISERROR(REGEXEXTRACT(A2, "[^\x00-\x80]+")), "", REGEXEXTRACT(A2, "[^\x00-\x80]+"))

Explanation of regular expression [^\x00-\x80]+

  • [\x00-\x80]: Represents the ASCII character range (character codes 0-128). (1) Standard ASCII range: 0-127 (0x00-0x7F aka * [\x00-\x7F])[1] (2) Character 128 ((0x80) is actually the first character in the extended ASCII range, not part of the original ASCII standard.[2][3]
  • [^...]: Means "not" these characters
  • +: Means one or more

Overall meaning: Matches one or more non-ASCII characters

Find Chinese Characters in MySQL[edit]

Find rows where column_name contains Chinese characters:

SELECT `column_name`
FROM `table_name`
WHERE HEX(`column_name`) REGEXP '^(..)*(E[4-9])';

Query condition used to match records where the column_name field contains only Chinese characters.

SELECT `column_name`
FROM `table_name`
WHERE `column_name` REGEXP '^[一-龯]+$';

Explanation:

  • [一-龯] - Character set that matches all characters from "一" to "龯" in Unicode
  • "一" has Unicode code point U+4E00[4]
  • "龯" has Unicode code point U+9FEF[5]
  • This range U+4E00-U+9FFF already covers over 99% of daily Chinese usage requirements Extension B and later blocks mainly contain ancient Chinese characters, variant characters, etc., which rarely appear in modern texts

Find Non-ASCII Characters in MySQL[edit]

Find rows where column_name is not entirely ASCII characters:

SELECT `column_name`
FROM `table_name`
WHERE `column_name` <> CONVERT(`column_name` USING ASCII)

Find Chinese Characters in PHP[edit]

Exact match:

// Approach 1
if (preg_match('/^[\x{4e00}-\x{9fa5}]+$/u', $string)) {
    echo "All text is Chinese characters" . PHP_EOL;
} else {
    echo "Some text is not Chinese characters" . PHP_EOL;
}

// Approach 2
if (preg_match('/^[\p{Han}]+$/u', $string)) {
    echo "All text is Chinese characters" . PHP_EOL;
} else {
    echo "Some text is not Chinese characters" . PHP_EOL;
}

Partial match:

// Approach 1
$string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐';
$pattern = '/[\p{Han}]+/u';
preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE);
var_dump($matches);

// Approach 2
$string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐';
$pattern = '/[\x{4e00}-\x{9fa5}]+/u';
preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE);
var_dump($matches);

Find ASCII Characters in PHP[edit]

Code I:

if (preg_match('/[^\x20-\x7f]/', $keyword) === 0) {
    echo "The keyword is ASCII only";
} else {
    echo "The keyword contains non-ASCII characters (like Chinese, Japanese, etc.)";
}

Code II:

$pattern = '/^[[:ascii:]]+$/i';
$text = "Hello World"; // ASCII only
if (preg_match($pattern, $text)) {
    echo "Pure ASCII characters";
} else {
    echo "Contains non-ASCII characters";
}

Remove Empty Lines[edit]

Original:

Neo
Trinity

Morpheus


Smith
Oracle

After:

Neo
Trinity
Morpheus
Smith
Oracle

Using Sublime Text & EmEditor: - Find: ^[\s\t]*$\n - Replace with: (empty)

Using Notepad++ v7.8.7: - Menu: Edit -> Line Operations -> Remove Empty Lines (Including Blank Lines)

Find Non-Whitespace Text[edit]

  • Find: [^\s]+

Convert Symbol-Separated Text to Line-by-Line Display[edit]

Example:

Before: 尼歐、莫斐斯、崔妮蒂、史密斯、祭師
After:
尼歐
莫斐斯
崔妮蒂
史密斯
祭師

Using Sublime Text or EmEditor: - Find: ([^、]+)([、]{1}) - Replace with: \1\n

Replace Multiple Spaces with Tab Characters[edit]

Before: aaa bbb ccc After: aaa\tbbb\tccc

Using Sublime Text: - Find: ([^\S\n]+) or ([^\S\r\n]+) or \s\s+ - Replace with: \t


Remove Leading/Trailing Whitespace[edit]

Remove Leading Whitespace[edit]

  • Find: ^\s+
  • Replace with: (empty)


Remove Trailing Whitespace[edit]

  • Find: \s+$
  • Replace with: (empty)


Remove Both Leading and Trailing Whitespace[edit]

  • Find: (^\s+|\s+$)
  • Replace with: (empty)


Text Editors Supporting Regular Expressions[edit]

Various text editors support regular expressions including: - Sublime Text - EmEditor - Notepad++ - Visual Studio Code - Atom - Vim/Neovim


Syntax Reference[edit]

  • Newline character: \r\n (for Notepad++: Extended mode & Regular expression mode)
  • Tab character: \t (for Notepad++: Extended mode)
  • Digits: \d (for Notepad++: Regular expression mode only)
  • Non-whitespace: \S - Does not include half-width spaces and full-width spaces

Troubleshooting Regular Expressions[edit]

Tips: 1. Use online tools like regex101 to understand your syntax 2. Test with small data: Prepare small file data to verify syntax 3. Highlight or output matched text for debugging 4. Simplify the syntax when encountering issues 5. Try alternative syntax due to compatibility issues (e.g., \d to [0-9]+)


Alternative Solutions[edit]

  • Use Tab-separated data that can be easily pasted into Google Sheets or MS Excel
  • Copy multiple rows and paste between different applications (compatibility varies)

Further Reading[edit]

  • Regular-Expressions.info - Regex Tutorial, Examples and Reference
  • Unicode character properties documentation
  • Platform-specific regular expression documentation

Data factory flow

[[Category: Revised with LLMs]