Regular expression: Difference between revisions
m (add 問答服務) |
Tags: Mobile edit Mobile web edit |
||
(256 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
透過正規表示法 (Regular Expression) 處理文字檔時,可以快速地搜尋或取代符合特定規則的字串。以每行為單位,進行字串處理<ref>[http://linux.vbird.org/linux_basic/0330regularex.php 鳥哥的 Linux 私房菜 -- 正規表示法 (regular expression, RE) 與文件格式化處理]</ref>。 正規表示法 又稱正規表示式、正規表達式、正則表達式、正規表示法、正規運算式、規則運算式、常規表示法<ref>[https://zh.wikipedia.org/wiki/%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F 正規表示式 - 維基百科,自由的百科全書]</ref>。 | |||
{{Raise hand | text = | |||
== | {{Raise hand | text = 有問題嗎?可以利用提供解說的[[Regular_expression#Regular_expression_online_tools | 線上工具]],嘗試自己除錯。 也可以到[http://www.ptt.cc/bbs/RegExp/index.html 看板 RegExp 文章列表 - 批踢踢實業坊]或其他[[問答服務]]詢問。 }} | ||
=== | |||
== 快速查表 == | |||
說明: (1) sample 藍色網底處代表符合規則的文字、(2) 同一文字規則可以有多種表示法 | |||
<table border="1" style="width:100%" class="wikitable"> | |||
<tr > | |||
<th style="background-color: #E0E0E0;"> 文字規則 </th> | |||
<th style="background-color: #E0E0E0; width:260px;"> sample </th> | |||
<th style="background-color: #9c9ca3;"> 對立的文字規則 </th> | |||
<th style="background-color: #9c9ca3; width:260px;"> sample</th> | |||
</tr> | |||
<tr> | |||
<td> 任意一個文字(包含空白,但不包含換行符號) <br /> {{kbd | key = <nowiki>.</nowiki>}} </td> | |||
<td><span style="background:#C6E3FF">W</span>hat Does the Fox Say? 12 狐狸怎叫 34</td> | |||
<td></td> | |||
<td></td> | |||
</tr> | |||
<tr> | |||
<td> 任意文字(包含空白),出現1次或0次 <br /> {{kbd | key = <nowiki>.?</nowiki>}} = {{kbd | key = <nowiki>.{0,1}</nowiki>}}</td> | |||
<td><span style="background:#C6E3FF">W</span>hat Does the Fox Say? 12 狐狸怎叫 34</td> | |||
<td></td> | |||
<td></td> | |||
</tr> | |||
<tr> | |||
<td> 任意次的多個文字(包含空白) <br /> {{kbd | key = <nowiki>.*</nowiki>}} ={{kbd | key = <nowiki> .{0,}</nowiki>}}</td> | |||
<td><span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span></td> | |||
<td></td> | |||
<td></td> | |||
</tr> | |||
<tr> | |||
<td> 任意次的文字(包含空白),至少出現1次 <br /> {{kbd | key = <nowiki>.+</nowiki>}} = {{kbd | key = <nowiki>.{1,}</nowiki>}}</td> | |||
<td><span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span></td> | |||
<td></td> | |||
<td></td> | |||
</tr> | |||
<tr> | |||
<td> 任意次的空白或換行符號 (至少出現1次的空白或換行符號) <br /> {{kbd | key = <nowiki>\s+</nowiki>}} </td> | |||
<td>What<span style="background:#C6E3FF"> </span>Does the Fox Say? 12 狐狸怎叫 34</td> | |||
<td>任意多個文字(不包含空白或換行符號) <br /> {{kbd | key = <nowiki>[^\s]+</nowiki>}} ={{kbd | key = <nowiki> [^\s]{1,}</nowiki>}} = {{kbd | key = <nowiki> [\S]+</nowiki>}} = {{kbd | key = <nowiki>[^ ]+</nowiki>}}</td> | |||
<td><span style="background:#C6E3FF">What</span> Does the Fox Say? 12 狐狸怎叫 34</td> | |||
</tr> | |||
<tr> | |||
<td> 任意次的 ASCII character (包含英文、數字和空白) [http://regexr.com/3aom2 demo]<ref>[http://www.asciitable.com/ Ascii Table - ASCII character codes and html, octal, hex and decimal chart conversion]</ref> <br /> {{kbd | key = <nowiki>[\x00-\x80]+</nowiki>}} 或 {{kbd | key = <nowiki>[[:ascii:]]+</nowiki>}}<ref>[https://stackoverflow.com/questions/24903140/regex-for-any-english-ascii-character-including-special-characters php - Regex for Any English ASCII Character Including Special Characters - Stack Overflow]</ref></td> | |||
<td><span style="background:#C6E3FF">What Does the Fox Say? 12</span> 狐狸怎叫 34</td> | |||
<td>非 ASCII,即中文出現任意次<br /> {{kbd | key = <nowiki>[^\x00-\x80]+</nowiki>}}</td> | |||
<td>What Does the Fox Say? 12 <span style="background:#C6E3FF">狐狸怎叫</span> 34</td> | |||
</tr> | |||
<tr> | |||
<td> 任意次的大小寫英文、數字和底線符號( _ ) (不包含空白) ([https://regex101.com/r/gIKB6a/1 demo])<br /> {{kbd | key = <nowiki>[\w]+</nowiki>}} = {{kbd | key = <nowiki>[a-zA-Z0-9_]+</nowiki>}} <br /> PHP 加上 {{kbd | key =u}} 修飾語,則可支援中文字 </td> | |||
<td><span style="background:#C6E3FF">What</span> <span style="background:#C6E3FF">Does</span> <span style="background:#C6E3FF">the</span> <span style="background:#C6E3FF">Fox</span> <span style="background:#C6E3FF">Say</span>? <span style="background:#C6E3FF">12</span> 狐狸怎叫 <span style="background:#C6E3FF">_34</span></td> | |||
<td> 任意次的不是英文、數字和底線符號( _ )的文字 <br /> {{kbd | key = <nowiki>\W+</nowiki>}} = {{kbd | key = <nowiki>[^a-zA-Z0-9_]+</nowiki>}}</td> | |||
<td>[http://regexr.com/3bk4v demo]</td> | |||
</tr> | |||
<tr> | |||
<td> 任意次的數字(不包含空白) <br /> {{kbd | key = <nowiki>[\d]+</nowiki>}} = {{kbd | key = <nowiki>[0-9]+</nowiki>}}</td> | |||
<td>What Does the Fox Say? <span style="background:#C6E3FF">12</span> 狐狸怎叫 34</td> | |||
<td>不包含數字的任意次文字(包含空白 <br /> {{kbd | key = <nowiki>[^\d]+</nowiki>}} = {{kbd | key = <nowiki>[^0-9]+</nowiki>}} = {{kbd | key = <nowiki>\D+</nowiki>}} </td> | |||
<td><span style="background:#C6E3FF">What Does the Fox Say? </span>12 狐狸怎叫 34</td> | |||
</tr> | |||
<tr> | |||
<td> 任意次的中文字 <br /> {{kbd | key = <nowiki>[\p{Han}]+</nowiki>}} ([https://regex101.com/r/UYkdml/1 demo]、[[Regular expression#尋找中文、非英文的文字 | 詳細說明]])</td> | |||
<td>What Does the Fox Say? 12 <span style="background:#C6E3FF">狐狸怎叫</span> 34</td> | |||
<td>不包含中文字的任意次文字 <br /> {{kbd | key = <nowiki>[^\p{Han}]+</nowiki>}} ([https://regex101.com/r/Nk9GdA/1 demo])</td> | |||
<td></td> | |||
</tr> | |||
<tr> | |||
<td> 以「狐狸」開頭的行 <br /> {{kbd | key = <nowiki>^狐狸.*$</nowiki>}}<ref>[http://www.regular-expressions.info/completelines.html Regex Examples: Matching Whole Lines of Text That Satisfy Certain Requirements]</ref></td> | |||
<td> | |||
<span style="background:#C6E3FF">狐狸怎叫 34 What Does the Fox Say?</span><br /> | |||
柴犬怎叫 What Does the shiba inu say? | |||
</td> | |||
<td>不以「狐狸」開頭的行 <br /> {{kbd | key = <nowiki>^(?!狐狸).*$</nowiki>}}<ref>[http://stackoverflow.com/questions/406230/regular-expression-to-match-text-that-doesnt-contain-a-word regex - Regular expression to match text that *doesn't* contain a word? - Stack Overflow]</ref> </td> | |||
<td> | |||
狐狸怎叫 34 What Does the Fox Say?<br /> | |||
<span style="background:#C6E3FF">柴犬怎叫 What Does the shiba inu say?</span> | |||
</td> | |||
</tr> | |||
<tr> | |||
<td> 以「怎叫」結尾的行 <br /> {{kbd | key = <nowiki>^.*怎叫$</nowiki>}} | |||
<td> | |||
What Does the Fox Say? 12 狐狸怎叫 34<br /> | |||
<span style="background:#C6E3FF">What Does the shiba inu say? 柴犬怎叫</span> | |||
</td> | |||
<td>不以「怎叫」結尾的行 <br /> {{kbd | key = <nowiki>.*(?<!怎叫)$</nowiki>}}<ref>[http://stackoverflow.com/questions/16398471/regex-not-ending-with Regex not ending with - Stack Overflow]</ref></td> | |||
<td> | |||
<span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br /> | |||
What Does the shiba inu say? 柴犬怎叫 | |||
</td> | |||
</tr> | |||
<tr> | |||
<td> 包含「狐狸」的行 <br /> {{kbd | key = <nowiki>^.*狐狸.*$</nowiki>}} 或 {{kbd | key = <nowiki>(狐狸)</nowiki>}} ([https://regex101.com/r/UEtYst/1 demo])</td> | |||
<td> | |||
<span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br /> | |||
What Does the shiba inu say? 柴犬怎叫 | |||
</td> | |||
<td>不包含「狐狸」的行 ([https://regex101.com/r/rvncjU/1 demo]) <br /> {{kbd | key = <nowiki>^((?!狐狸).)*$</nowiki>}} </td> | |||
<td> | |||
What Does the Fox Say? 12 狐狸怎叫 34<br /> | |||
<span style="background:#C6E3FF">What Does the shiba inu say? 柴犬怎叫 </span> | |||
</td> | |||
</tr> | |||
<tr> | |||
<td> 布林邏輯 AND: 包含「狐狸」和「叫」的行 ([http://regexr.com/3aokl demo])<ref>[http://stackoverflow.com/questions/469913/regular-expressions-is-there-an-and-operator regex - Regular Expressions: Is there an AND operator? - Stack Overflow]</ref><br /> {{kbd | key = <nowiki>(?=.*狐狸)(?=.*叫).*</nowiki>}} 或 {{kbd | key = <nowiki>狐狸.*叫|叫.*狐狸</nowiki>}}</td> | |||
<td> | |||
<span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br /> | |||
<span style="background:#C6E3FF">What Does the Fox Say? 12 不叫狐狸 34</span><br /> | |||
What Does the shiba inu say? 柴犬怎叫 | |||
</td> | |||
<td></td> | |||
<td></td> | |||
</tr> | |||
<tr> | |||
<td> 布林邏輯 OR: 包含「狐狸」或「叫」的行 ([https://regexr.com/6cu06 demo])<br /> {{kbd | key = <nowiki>.*(狐狸|叫).*</nowiki>}}</td> | |||
<td> | |||
<span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34<br /> | |||
What Does the shiba inu say? 柴犬怎叫</span><br /> | |||
What Does the shiba inu say? 柴犬怎了 | |||
</td> | |||
<td>布林邏輯: 不包含「狐狸」也不包含「柴犬」的行<br /> {{kbd | key = <nowiki>^((?!狐狸|柴犬).)*$</nowiki>}}</td> | |||
<td>What Does the Fox Say? 12 狐狸怎叫 34<br /> | |||
What Does the shiba inu say? 柴犬怎叫<br /> | |||
<span style="background:#C6E3FF">What Does the Husky say? 哈士奇怎叫 </span></td> | |||
</tr> | |||
<tr> | |||
<td> 布林邏輯 NOT: 不包含「狐狸」,但包含「柴犬」的行 ([http://regexr.com/3aokr demo])<ref>[http://stackoverflow.com/questions/2953039/regular-expression-for-a-string-containing-one-word-but-not-another regex - Regular expression for a string containing one word but not another - Stack Overflow]</ref><br /> {{kbd | key = <nowiki>^((?!狐狸).)*(柴犬).*$</nowiki>}} = {{kbd | key = <nowiki>^(柴犬).*((?!狐狸).)*$</nowiki>}} = {{kbd | key = <nowiki>(柴犬).*((?!狐狸).)*</nowiki>}} (如果句子同時存在狐狸和柴犬會出錯) </td> | |||
<td> | |||
What Does the Fox Say? 12 狐狸怎叫 34<br /> | |||
<span style="background:#C6E3FF">What Does the shiba inu say? 柴犬怎叫</span> | |||
</td> | |||
<td></td> | |||
<td></td> | |||
</tr> | |||
</table> | |||
== Regular expression online tools == | |||
測試 Regular expression 語法的網站 | |||
* {{Gd}} [http://regex101.com/ RegEx101] "Online regex tester and debugger: PHP, PCRE, Python, Golang and JavaScript" ([http://regex101.com/r/tH1eT7/1 example]) 有提供語法解說。教學: [https://www.minwt.com/webdesign-dev/html/20352.html RegEx101正規表示法線上產生器,有沒有選到立馬告訴你|梅問題.教學網] | |||
* {{Gd}} [http://gskinner.com/RegExr/ RegExr]: Learn, Build, & Test RegEx ([http://regexr.com/395t0 example]). 有提供語法解說. 教學: [http://blog.hsdn.net/1426.html RegExr: 功能強大的正規式撰寫協助工具] | |||
* [https://regexper.com/ Regexper]: 圖解方式提供語法解說 e.g. [https://regexper.com/#%5Cd%7B3%7D%28.*%29 \d{3}(.*)] | |||
* [https://jex.im/regulex/ Regulex:JavaScript Regular Expression Visualizer] : 圖解方式提供語法解說 e.g. [https://jex.im/regulex/#!flags=&re=%5E(a%7Cb)*%3F%24 ^(a|b)*?$] | |||
* [http://www.rubular.com/ Rubular]: a Ruby regular expression editor and tester ([http://www.rubular.com/r/UZuUT5pjeh example]) | |||
* [http://www.phpliveregex.com/ PHP Live Regex] {{access | date=2014-11-25}} | |||
* [http://www.regextester.com/ Regex Tester and Debugger Online - Javascript, PCRE, PHP] {{access | date=2016-01-07}} | |||
* [http://rocksaying.tw/archives/2670695.html Regular Expression (RegExp) in JavaScript - 石頭閒語] {{access | date=2017-11-14}} | |||
Examples | |||
* {{Gd}} [http://regexlib.com/ Regular Expression Library] 網友提供的 pattern 範例 | |||
== cases == | |||
=== 取代換行符號為逗號 === | |||
將Email清單,轉成Email軟體可以使用的寄信名單 | |||
<pre> | <pre> | ||
原 | 原 | ||
Line 10: | Line 159: | ||
改成 | 改成 | ||
</pre> | </pre> | ||
Line 17: | Line 166: | ||
# Menu: Search -> Replace | # Menu: Search -> Replace | ||
# click "Use Regular Expression" | # click "Use Regular Expression" | ||
## Find: \n | ## Find: {{kbd | key = <nowiki>\n</nowiki>}} ([[Return symbol | 換行符號]] 。{{Win}} 作業系統的換行符號是 {{kbd | key = <nowiki>\r\n</nowiki>}}、{{Mac}} 作業系統的換行符號是 {{kbd | key = <nowiki>\n</nowiki>}},取兩者共有的符號。如果使用 {{Linux}} 作業系統的換行符號是 {{kbd | key = <nowiki>\r</nowiki>}}。 ) | ||
## Replace with: , | ## Replace with: {{kbd | key = <nowiki>, </nowiki>}} | ||
# click "Replace all" | # click "Replace all" | ||
<div style="float: left; width: 100%; position: relative; display: block; clear: left;"> | |||
<div style="width: 46%; float: left; margin:0 auto; position: relative; display: block; "> | |||
===== 將每行的文字,移除換行,並且都加上逗號分隔 ===== | |||
<pre> | |||
// before | |||
Elmo | |||
Emie | |||
Granny Bird | |||
// after | |||
Elmo, Emie, Granny Bird | |||
</pre> | |||
方法: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor]。 | |||
* Find what: {{kbd | key = <nowiki>\n</nowiki>}} | |||
* Replace with: {{kbd | key = <nowiki>, </nowiki>}} 此例是將每行的文字,都加上逗號+空格分隔 (如果要用別的符號分隔,例如頓號分隔,則是 Replace with: {{kbd | key = <nowiki>、</nowiki>}}) | |||
</div> | |||
<div style="width: 46%; float: left; margin:0 auto; position: absolute; display: block; left: 54%; top: 0;"> | |||
===== 將逗號分隔的文字,還原成逐行顯示,並且移除分隔符號 (,) ===== | |||
<pre> | |||
// before | |||
Elmo, Emie, Granny Bird | |||
// after | |||
Elmo | |||
Emie | |||
Granny Bird | |||
</pre> | |||
方法: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor]。{{exclaim}} 輸出結果的每行前面可能會有空白 | |||
* Find what: {{kbd | key = <nowiki>([^,]+),</nowiki>}} | |||
* Replace with: {{kbd | key = <nowiki>\1\n</nowiki>}} | |||
</div> | |||
</div> | |||
<div style="clear:both;"> </div> | |||
==== 方案2: Notepad++ ==== | ==== 方案2: Notepad++ ==== | ||
Line 25: | Line 214: | ||
# 選單: 尋找 -> 取代 | # 選單: 尋找 -> 取代 | ||
# 搜尋模式: 勾選「增強模式」 (不是勾選「用類型表式」) | # 搜尋模式: 勾選「增強模式」 (不是勾選「用類型表式」) | ||
## 尋找目標: | ## 尋找目標: {{kbd | key = <nowiki>\n</nowiki>}} (換行符號) | ||
## 取代成: , | ## 取代成: {{kbd | key = <nowiki>, </nowiki>}} | ||
# 勾選全部取代 | # 勾選全部取代 | ||
Line 35: | Line 224: | ||
# 選單: 編輯 -> 取代 | # 選單: 編輯 -> 取代 | ||
# 勾選增強模式 | # 勾選增強模式 | ||
## 尋找目標: ^p (段落標記) | ## 尋找目標: {{kbd | key = <nowiki>^p</nowiki>}} (段落標記) | ||
## 取代為: , | ## 取代為: {{kbd | key = <nowiki>, </nowiki>}} | ||
# 勾選全部取代 | # 勾選全部取代 | ||
Line 48: | Line 237: | ||
{{kbd | key=<nowiki>sed ':a;N;$!ba;s/\n/; /g' old.filename > new.filename</nowiki>}} <ref>參考 [http://stackoverflow.com/questions/1251999/sed-how-can-i-replace-a-newline-n unix - sed: How can I replace a newline? ]</ref> | {{kbd | key=<nowiki>sed ':a;N;$!ba;s/\n/; /g' old.filename > new.filename</nowiki>}} <ref>參考 [http://stackoverflow.com/questions/1251999/sed-how-can-i-replace-a-newline-n unix - sed: How can I replace a newline? ]</ref> | ||
=== Find IP address === | ==== 方案5: 使用支援十六進位編輯 (HEX) 的編輯軟體 ==== | ||
使用支援十六進位編輯 (HEX) 的編輯軟體,例如: [https://itunes.apple.com/tw/app/ihex-hex-editor/id909566003?mt=12 iHex - Hex Editor] for {{Mac}} | |||
# 選單 Edit -> Find | |||
# Find: {{kbd | key=<nowiki>0A</nowiki>}} 換行符號 | |||
# Replace: {{kbd | key=<nowiki>2c 20</nowiki>}} 其中 2c 是逗號, 20 是空白 | |||
# 儲存檔案 | |||
相關資料 | |||
* [https://www.hexdictionary.com/ Hex Dictionary | Convert Hex / Hexadecimal Numbers to Binary and Decimal] | |||
=== Find IP address (IPv4) === | |||
適用 [http://notepad-plus-plus.org/ Notepad++] 軟體 v.5.9.5 | |||
# 選單: 尋找 -> 取代 | # 選單: 尋找 -> 取代 | ||
# 搜尋模式: 勾選「用類型表式」 | # 搜尋模式: 勾選「用類型表式」 | ||
## 尋找目標: \d\d?\d?\.\d\d?\d?\.\d\d?\d?\.\d\d?\d? | ## 尋找目標: {{kbd | key=<nowiki>\d\d?\d?\.\d\d?\d?\.\d\d?\d?\.\d\d?\d?</nowiki>}} | ||
note: not support {n} syntax | note: not support {n} syntax | ||
適用 [https://www.sublimetext.com/ Sublime Text] v. 3.2.21 | |||
# Find: {{kbd | key=<nowiki>(?:\d{1,3}\.){3}\d{1,3}</nowiki>}} | |||
參考資料: | |||
* [https://www.regular-expressions.info/ip.html How to Find or Validate an IP Address] {{access | date = 2019-06-05}} | |||
* [http://sourceforge.net/projects/notepad-plus/forums/forum/331754/topic/4780602 SourceForge.net: Notepad++: Regular expression for IP addresses] | |||
* [http://stackoverflow.com/questions/53497/regular-expression-that-matches-valid-ipv6-addresses regex - Regular expression that matches valid IPv6 addresses - Stack Overflow] {{access | date = 2015-08-10}} | |||
=== 移除記事本純文字檔的黑色方塊(UNIX系統的換行符號 LF ) === | === 移除記事本純文字檔的黑色方塊(UNIX系統的換行符號 LF ) === | ||
Line 69: | Line 275: | ||
=== 將陣列的每項元素,都加上引號框起來 === | === 將每項元素,加上引號框起來 === | ||
==== 將陣列的每項元素,都加上引號框起來 ==== | |||
<pre> | <pre> | ||
Elmo, Emie, Granny Bird, Herry Monster, 喀喀獸 | Elmo, Emie, Granny Bird, Herry Monster, 喀喀獸 | ||
Line 75: | Line 282: | ||
'Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸' | 'Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸' | ||
</pre> | </pre> | ||
使用 PHP | 方法1: 使用 PHP | ||
{{exclaim}} 如果元素包含換行符號,不能用下面方法處理。 | |||
<pre> | <pre> | ||
$users = array('Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸'); | $users = array('Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸'); | ||
// | //「單引號」相隔每個元素 | ||
$result = implode(", | $result = implode(",", preg_replace('/^(.*?)$/', "'$1'", $users)); | ||
// | |||
//「雙引號」相隔每個元素 | |||
$result = implode(",", preg_replace('/^(.*?)$/', "\"$1\"", $users)); | |||
echo $result; | echo $result; | ||
</pre> | </pre> | ||
Line 87: | Line 296: | ||
Thanks, Joshua! More on [http://melikedev.com/2010/02/24/php-wrap-implode-array-elements-in-quotes/ PHP - Wrap Implode Array Elements in Quotes » Me Like Dev] | Thanks, Joshua! More on [http://melikedev.com/2010/02/24/php-wrap-implode-array-elements-in-quotes/ PHP - Wrap Implode Array Elements in Quotes » Me Like Dev] | ||
方法2: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor] | |||
* Find: {{kbd | key = <nowiki>([^\s|,]+)</nowiki>}} | |||
* 分隔符號 | |||
**「單引號」相隔每個元素 Replace with: {{kbd | key = <nowiki>'\1'</nowiki>}} | |||
**「雙引號」相隔每個元素 Replace with: {{kbd | key = <nowiki>"\1"</nowiki>}} | |||
方法3: 使用 [https://notepad-plus-plus.org/ Notepad++]。啟用搜尋模式的「用類型表式」 | |||
* Find: {{kbd | key = <nowiki>([^\s|,]+)</nowiki>}} | |||
* 分隔符號 | |||
**「單引號」相隔每個元素 Replace with: {{kbd | key = <nowiki>'$1'</nowiki>}} | |||
**「雙引號」相隔每個元素 Replace with: {{kbd | key = <nowiki>"$1"</nowiki>}} | |||
<div style="float: left; width: 100%; position: relative; display: block; clear: left;"> | |||
<div style="width: 46%; float: left; margin:0 auto; position: relative; display: block; "> | |||
==== 將每行的文字,都加上引號框起來,並且移除換行 ==== | |||
<pre> | |||
// before | |||
Elmo | |||
Emie | |||
Granny Bird | |||
// after | |||
'Elmo', 'Emie', 'Granny Bird' | |||
</pre> | |||
方法1: 使用 [http://www.sublimetext.com/ Sublime Text] 、[https://notepad-plus-plus.org/downloads/ Notepad++] 或 [https://zh-tw.emeditor.com/ EmEditor]。該方法有處理每行的前面或後面可能有一格或多格空白 | |||
如果使用 {{Mac}} 作業系統 | |||
* Find what: {{kbd | key = <nowiki>(\S+)(\s?)+$\n</nowiki>}} | |||
* Replace with: {{kbd | key = <nowiki>'\1', </nowiki>}} <br />(如果要使用雙引號框起來,則是 Replace with: {{kbd | key = <nowiki>"\1", </nowiki>}}) | |||
如果使用 {{Win}} 作業系統,需要修改換行符號 {{kbd | key = <nowiki>\n</nowiki>}} 為 {{kbd | key = <nowiki>\r\n</nowiki>}} | |||
* Find what: {{kbd | key = <nowiki>(\S+)(\s?)+$\r\n</nowiki>}} on {{Mac}} | |||
* Replace with: {{kbd | key = <nowiki>'\1', </nowiki>}} <br />(如果要使用雙引號框起來,則是 Replace with: {{kbd | key = <nowiki>"\1", </nowiki>}}) | |||
方法2: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor] {{exclaim}} 該方法沒有處理每行的後面可能有一格或多格空白 | |||
* Find what: {{kbd | key = <nowiki>(.*)$\n</nowiki>}} 或 {{kbd | key = <nowiki>(\S+)$\n</nowiki>}} 或 {{kbd | key = <nowiki>(\S+)\n</nowiki>}} | |||
* Replace with: {{kbd | key = <nowiki>'\1', </nowiki>}} | |||
More details on the page [[Add quotation at the start and end of each line | add quotation at the start and end of each line]]. | |||
</div> | |||
<div style="width: 46%; float: left; margin:0 auto; position: absolute; display: block; left: 54%; top: 0;"> | |||
==== 將引號框起來的文字,還原成逐行顯示,並且移除分隔符號 (,) ==== | |||
<pre> | |||
// before | |||
'Elmo', 'Emie', 'Granny Bird' | |||
// after | |||
Elmo | |||
Emie | |||
Granny Bird | |||
</pre> | |||
方法: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor]。該方法有處理每行的前面或後面可能有一格或多格空白 | |||
* Find what: {{kbd | key = <nowiki>'(([^,|^'])+)',?\s?</nowiki>}} | |||
* Replace with: {{kbd | key = <nowiki>\1\n</nowiki>}} | |||
</div> | |||
</div> | |||
<div style="clear:both;"> </div> | |||
=== | |||
適用: Google Drive | === 將試算表欄位值前後,加上雙引號框起來 === | ||
* [https://errerrors.blogspot.com/2019/03/how-to-enclose-non-empty-cell-with-double-quotes-in-google-spreadsheet.html Google 試算表的文字類型欄位值的前後加上雙引號] | |||
=== Find non-ASCII characters 尋找中文、非英文的文字 === | |||
==== Find non-ASCII characters in Google sheet ==== | |||
適用: Google Drive 試算表的 Regular expression 相關函數,例如: [https://support.google.com/docs/answer/3098292?hl=zh-Hant REGEXMATCH]、[https://support.google.com/docs/answer/3098244?hl=en REGEXEXTRACT]、[https://support.google.com/docs/answer/3098245?hl=zh-Hant RegExReplace] 函數、Notepad++的搜尋 | |||
<pre> | <pre> | ||
[^\x00-\x80]+ | [^\x00-\x80]+ | ||
</pre> | </pre> | ||
適用: | ==== Find non-ASCII characters in LibreOffice ==== | ||
適用: [https://zh-tw.libreoffice.org/ LibreOffice] [https://help.libreoffice.org/6.2/en-US/text/scalc/01/func_regex.html REGEX] function<ref>[https://help.libreoffice.org/6.2/en-US/text/shared/01/02100001.html?&DbPAR=WRITER&System=MAC List of Regular Expressions]</ref>、Total commander 的 Multi-Rename tool<ref>取代非英文的文字,但是不包含 . 符號: <nowiki>[^\u0000-\u0080|.]+ </nowiki></ref><ref>[http://stackoverflow.com/questions/150033/regular-expression-to-match-non-english-characters javascript - Regular expression to match non-english characters? - Stack Overflow]</ref> | |||
<pre> | <pre> | ||
[^\u0000-\u0080]+ | [^\u0000-\u0080]+ | ||
</pre> | </pre> | ||
參考資料: [http://stackoverflow.com/questions/ | ==== Find Chinese characters in Google sheet ==== | ||
範例:如果 A2 包含任一中文字,則欄位值顯示「中文」。如果未包含任何中文字,則欄位值顯示「英文」: | |||
<pre> | |||
=IF(REGEXMATCH(A2, "[\一-\龥]"), "中文", "英文") | |||
</pre> | |||
{{exclaim}} Google 不支援以下語法,會顯示「... 是無效的規則運算式。」錯誤 | |||
* {{kbd | key=<nowiki>[\u4e00-\u9fa5]</nowiki>}} | |||
* {{kbd | key=<nowiki>[^\u4e00-\u9fa5]</nowiki>}} | |||
* {{kbd | key=<nowiki>[\p{Script=Hans}]</nowiki>}} | |||
* {{kbd | key=<nowiki>[\p{Han}]</nowiki>}} | |||
==== Find Chinese characters in MySQL ==== | |||
尋找 `column_name` 欄位值包含中文字。適用: MySQL<ref>[https://stackoverflow.com/questions/9795137/how-to-detect-rows-with-chinese-characters-in-mysql How to detect rows with chinese characters in MySQL? - Stack Overflow]</ref><ref>[https://stackoverflow.com/questions/401771/how-can-i-find-non-ascii-characters-in-mysql How can I find non-ASCII characters in MySQL? - Stack Overflow]</ref> | |||
<pre> | |||
SELECT `column_name` | |||
FROM `table_name` | |||
WHERE HEX(`column_name`) REGEXP '^(..)*(E[4-9])'; | |||
</pre> | |||
說明 | |||
* 正則表達式 '^(..)*(E[4-9])' 的含義是尋找從字符串開始處(表示為 ^),每兩個字符(表示為 ..)重複零次或多次(表示為 *),直到找到一個匹配 (E[4-9]) 的序列。 | |||
* 透過加入 ^(..)* 使得搜尋條件更加嚴格,它要求 (E[4-9]) 的出現位置必須是在一個合法的 UTF-8 字符邊界上。這意味著它更可能正確匹配開頭為中文字符的字符串,而忽略那些僅在中間或末尾偶然包含 E4 到 E9 序列的非中文字符串。 | |||
==== Find non-ASCII characters in MySQL ==== | |||
尋找 `column_name` 欄位值不完全是 ASCII 字元 | |||
<pre> | |||
SELECT `column_name` | |||
FROM `table_name` | |||
WHERE `column_name` <> CONVERT(`column_name` USING ASCII) | |||
</pre> | |||
==== Find non-ASCII characters in PHP ==== | |||
尋找欄位值包含中文字,中文字包含繁體中文與簡體中文,不包含標點符號 (例如 {{kbd | key = <nowiki>,</nowiki>}})、全形標點符號 (例如 {{kbd | key = <nowiki>,</nowiki>}})以及特殊符號,例如 Emoji:{{kbd | key = ⭐}}。 | |||
PHP: exact match | |||
<pre> | |||
// approach 1 | |||
if (preg_match('/^[\x{4e00}-\x{9fa5}]+$/u', $string)) { | |||
echo "全部文字都是中文字" . PHP_EOL; | |||
}else{ | |||
echo "部分文字不是中文字" . PHP_EOL; | |||
} | |||
// approach 2 | |||
if (preg_match('/^[\p{Han}]+$/u', $string)) { | |||
echo "全部文字都是中文字" . PHP_EOL; | |||
}else{ | |||
echo "部分文字不是中文字" . PHP_EOL; | |||
} | |||
</pre> | |||
partial match ([http://sandbox.onlinephpfunctions.com/code/d780845d20877c0fd2e693b28ed02a10d250d39e online demo] hosted by [http://sandbox.onlinephpfunctions.com/ PHP Sandbox]) | |||
<pre> | |||
// approach 1 | |||
$string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐'; | |||
$pattern = '/[\p{Han}]+/u'; | |||
preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE); | |||
var_dump($matches); | |||
// approach 2 | |||
$string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐'; | |||
$pattern = '/[\x{4e00}-\x{9fa5}]+/u'; | |||
preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE); | |||
var_dump($matches); | |||
</pre> | |||
技術問題除錯: 錯誤訊息 | |||
<pre>preg_match(): Compilation failed: character value in \x{} or \o{} is too large at offset 8</pre> | |||
解決方式: [http://php.net/manual/en/function.preg-match.php preg_match()] 需要加上 {{kbd | key = u }} 變數<ref>[https://stackoverflow.com/questions/32375531/preg-match-compilation-failed-character-value-in-x-or-o-is-too-large-a php - preg_match(): Compilation failed: character value in \x{} or \o{} is too large at offset 27 on line number 25 - Stack Overflow]</ref>。 | |||
==== Find non-ASCII characters in JavaScript ==== | |||
* [https://stackoverflow.com/questions/21109011/javascript-unicode-string-chinese-character-but-no-punctuation regex - Javascript unicode string, chinese character but no punctuation - Stack Overflow] | |||
參考資料: | |||
* [http://blog.csdn.net/tinyletero/article/details/8201465 unicode编码 \u4e00-\u9fa5 匹配所有中文 - CSDN博客] | |||
* [https://stackoverflow.com/questions/38168419/codeigniter-form-validation-for-chinese-words php - CodeIgniter Form Validation for Chinese Words - Stack Overflow] | |||
* [https://zh.wikipedia.org/zh-tw/%E4%B8%AD%E6%97%A5%E9%9F%93%E7%B5%B1%E4%B8%80%E8%A1%A8%E6%84%8F%E6%96%87%E5%AD%97%E5%88%97%E8%A1%A8 中日韓統一表意文字列表 - 維基百科,自由的百科全書] | |||
=== 尋找英文字 === | |||
==== 尋找 ASCII 字元 in MySQL ==== | |||
<pre> | |||
-- 尋找欄位 `my_column` 欄位值是 ASCII 字元 | |||
SELECT * | |||
FROM `my_table` | |||
WHERE `my_column` LIKE CONVERT(`my_column` USING ASCII) | |||
</pre> | |||
相關文章 | |||
* [https://errerrors.blogspot.com/2020/07/search-app-not-apple-in-englishsearching.html 解決英文字的搜尋:搜尋 app 而不是 apple] | |||
參考資料 | |||
* [https://stackoverflow.com/questions/401771/how-can-i-find-non-ascii-characters-in-mysql How can I find non-ASCII characters in MySQL? - Stack Overflow] | |||
=== | ==== 尋找英文字、數字、破折號(-)或底線(_)字元 in MySQL ==== | ||
<pre> | |||
-- 尋找欄位 `my_column` 欄位值是包含英文字、數字、破折號(-)或底線(_)的字串 | |||
SELECT * | |||
FROM `my_table` | |||
WHERE `my_column` REGEXP '[a-zA-Z0-9\-_]' | |||
</pre> | |||
=== 將每行文字的行頭加上逗號符號 === | |||
[[Adding characters to document lines]] | |||
=== 知道前面跟後面的文字,但是中間文字忘記了 === | === 知道前面跟後面的文字,但是中間文字忘記了 === | ||
Line 116: | Line 496: | ||
# 搜尋模式: 勾選「用類型表示」 | # 搜尋模式: 勾選「用類型表示」 | ||
## 尋找目標: {{kbd | key=a(.*)le}} 就可以找到(1)apple (2)apps lesson ... 等a開頭、le結尾的文字,中間可夾雜空白。 {{exclaim}} 中文字串搜尋,建議將文件的編碼改成 UTF-8 編碼 | ## 尋找目標: {{kbd | key=a(.*)le}} 就可以找到(1)apple (2)apps lesson ... 等a開頭、le結尾的文字,中間可夾雜空白。 {{exclaim}} 中文字串搜尋,建議將文件的編碼改成 UTF-8 編碼 | ||
=== 移除空白行 === | |||
<pre> | |||
# (原) 每行可能間隔一行空白或多行空白 | |||
尼歐 | |||
崔妮蒂 | |||
莫斐斯 | |||
史密斯 | |||
祭師 | |||
# (後) 改成每行逐行緊接著 | |||
尼歐 | |||
崔妮蒂 | |||
莫斐斯 | |||
史密斯 | |||
祭師 | |||
</pre> | |||
移除一行空白或多行空白( 行內可能包含一個或多個空白字元 {{kbd | key= SPACE}} 、定位鍵{{kbd | key= TAB}}) | |||
* 使用工具: 適用 Sublime Text 與 EmEditor 軟體,需勾選「使用規則運算式」。{{exclaim}} 以下語法不適用於 Notepad++ 軟體<ref>[http://www.sitepoint.com/forums/showthread.php?448843-Regex-delete-multiple-blank-lines Regex: delete multiple blank lines]</ref> | |||
** 尋找: {{kbd | key=<nowiki>^[\s\t]*$\n</nowiki>}} --> 取代為: 空 (不需要輸入任何字) | |||
* 使用工具: Notepad++ v7.8.7 | |||
** Notepad++ 軟體選單: 編輯 -> 行處理 -> 移除空行(包括只有空白字元的行)<ref>[http://stackoverflow.com/questions/3866034/removing-empty-lines-in-notepad regex - Removing empty lines in Notepad++ - Stack Overflow]</ref> | |||
* 詳細說明,請見 [[Regular replace blank lines]] | |||
=== 尋找非空白的文字 === | |||
* 尋找: {{kbd | key=<nowiki>[^\s]+</nowiki>}} [https://regex101.com/r/zH7wV3/1 online demo] | |||
* [https://errerrors.blogspot.com/2022/01/avoid-whitespace-character-caused-program-stop-abnormally.html 解決遇到空白段落發生程式異常錯誤而執行中斷的問題] 「... 看起來空白的字元,卻無法使用 TRIM 函數去除,可能是其他的空白字元。解決方式是偵測段落內有沒有包含中英文、數字,再進行後續處理。」 | |||
=== 去除標點符號、特殊符號等 === | |||
* [https://stackoverflow.com/questions/5689918/php-strip-punctuation/5689989 regex - PHP strip punctuation - Stack Overflow] | |||
=== 將特定符號相隔的文字,改成逐行顯示 === | |||
例子: | |||
<pre> | |||
# (原) 頓號(、)符號相隔的文字 | |||
尼歐、莫斐斯、崔妮蒂、史密斯、祭師 | |||
# (後) 改成逐行顯示 | |||
尼歐 | |||
莫斐斯 | |||
崔妮蒂 | |||
史密斯 | |||
祭師 | |||
</pre> | |||
使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor] | |||
* Find: {{kbd | key = <nowiki>([^、]+)([、]{1})</nowiki>}} | |||
* Replace with: {{kbd | key = <nowiki>\1\n</nowiki>}} | |||
語法說明 | |||
* <nowiki>[^、]</nowiki> : 符合任意字,但不是頓號(、)的文字 | |||
* <nowiki>[^、]+</nowiki> : 一次以上不是頓號(、)的文字 | |||
* <nowiki>([^、]+)</nowiki> : 符合「一次以上不是頓號(、)的文字」規則的文字 | |||
* <nowiki>[、]</nowiki>: 出現頓號(、)任意次的文字 | |||
* <nowiki>[、]{1}</nowiki> : 出現頓號(、)一次的文字 | |||
* <nowiki>([、]{1})</nowiki> : 符合「出現頓號(、)一次的文字」規則的文字 | |||
=== 將每行文字的結尾處,加入空一格 (半形空白) === | |||
法1: 適用軟體: Sublime Text, EmEditor | |||
# Menu: Search -> Replace | |||
# click "Use Regular Expression" | |||
## Find: {{kbd | key = <nowiki>\n</nowiki>}} | |||
## Replace with: {{kbd | key = <nowiki>_\n</nowiki>}}(符號 {{kbd | key = <nowiki>\n</nowiki>}} 前面的 _ 自行替換成半形空白) | |||
# click "Replace all" | |||
法2: 適用軟體: Sublime Text, EmEditor | |||
# Menu: Search -> Replace | |||
# click "Use Regular Expression" | |||
## Find: {{kbd | key = <nowiki>$</nowiki>}} | |||
## Replace with: {{kbd | key = <nowiki>_$</nowiki>}}(符號 {{kbd | key = <nowiki>$</nowiki>}} 前面的 _ 自行替換成半形空白) | |||
# click "Replace all" | |||
{{exclaim}} 需要檢查最後一行是否是空白行,如果不是空白行,不會套用到該取代規則 | |||
=== 將每行文字內夾雜的空白,取代成 Tab 符號 === | |||
將原本空白間隔的欄位值,取代成 Tab鍵間隔的欄位值。輸出結果可以方便貼到 MS Excel 或 [[Google spreadsheet]]。 | |||
<pre># \t 代表是 Tab 鍵,又稱定位鍵 | |||
# before | |||
aaa bbb ccc | |||
# after | |||
aaa\tbbb\tccc | |||
</pre> | |||
說明: \S 代表非空白字元, \r\n 代表[[Return symbol | 換行符號]]。[^\S\r\n] 則代表不是非空白字元、也不是換行符號。換句話說尋找空白,但不包含換行符號。 | |||
使用 Sublime Text 軟體 (參考資料<ref>[http://www.techrepublic.com/blog/microsoft-office/quickly-replace-multiple-space-characters-with-a-tab-character/ Quickly replace multiple space characters with a tab character - TechRepublic]</ref> <ref>[http://stackoverflow.com/questions/3469080/match-whitespace-but-not-newlines-perl regex - Match whitespace but not newlines (Perl) - Stack Overflow]</ref>) | |||
# Menu: Search -> Replace | |||
# click "Use Regular Expression" | |||
## Find: {{kbd | key = <nowiki>([^\S\n]+)</nowiki>}} 或 {{kbd | key = <nowiki>([^\S\r\n]+)</nowiki>}} 或 {{kbd | key = <nowiki>\s\s+</nowiki>}} 或 {{kbd | key = <nowiki>_{1,}</nowiki>}} ( 自行替換 _ 成半形空白) {{exclaim}} 因為 {{kbd | key = <nowiki>\s</nowiki>}} 包含了空白與換行字元,所以不能直接使用 {{kbd | key = <nowiki>\s+</nowiki>}} 當做搜尋條件 | |||
## Replace with: {{kbd | key = <nowiki>\t</nowiki>}} | |||
# click "Replace all" | |||
=== 移除每行文字前後面可能多個的空白 === | |||
==== 移除每行文字最前面可能多個的空白 ==== | |||
* 尋找: {{kbd | key = <nowiki>^\s+</nowiki>}} --> 取代為: 空白 (適用軟體: Sublime Text、EmEditor,需啟用 "Use Regular Expression" ) | |||
<pre># before | |||
aaa | |||
bbb | |||
ccc | |||
# after | |||
aaa | |||
bbb | |||
ccc | |||
</pre> | |||
==== 移除每行文字最後面可能多個的空白 ==== | |||
* 尋找: {{kbd | key = <nowiki>\s+$</nowiki>}} --> 取代為: 空白 (適用軟體: Sublime Text、EmEditor,需啟用 "Use Regular Expression" ) | |||
==== 移除每行文字前面或後面可能多個的空白 ==== | |||
* 尋找: {{kbd | key = <nowiki>(^\s+|\s+$)</nowiki>}} --> 取代為: 空白 (適用軟體: Sublime Text、EmEditor,需啟用 "Use Regular Expression" ) | |||
=== 尋找包含不是數字,是文字的行 === | |||
預期每行資料都是數字,尋找包含不是數字,是文字的行 | |||
<pre> | |||
[^\d|\n] | |||
</pre> | |||
=== 尋找 Hashtag === | |||
[[Extract all hashtags from text]] | |||
=== 尋找文章內容中的網址 === | |||
[[Extract url from text]] | |||
=== 尋找數字 === | |||
請參考 [[Data cleaning#Numeric]] | |||
* [[Extract large number from text | 尋找文章內容中的長數字]] | |||
* [https://errerrors.blogspot.com/2020/02/convert-minguo-calendar-to-common-era-using-google-sheet.html Google 試算表將民國轉西元日期] | |||
=== 移除刮號內的文字 === | |||
請參考 [[Remove text within brackets]] | |||
=== Search unmatched string === | |||
find un-commented console.log: | |||
original format: some lines contains un-commented [[Javascript debug]] information | |||
<pre> | |||
console.log("un-commented debug information"); | |||
//console.log("commented debug information"); | |||
</pre> | |||
Search pattern: find not started with the / symbol before the string "console.log" | |||
<pre> | |||
[^/](console\.log) | |||
</pre> | |||
== Text editor with support for regular expression == | |||
[[Text editor with support for regular expression]] | |||
== Regular expression batch tools == | |||
'''multiple''' regular expression operations on the same file | |||
* {{Gd}} [https://github.com/facelessuser/RegReplace RegReplace] 執行多個取代命令 "Simple find and replace sequencer plugin for Sublime Text" Quoted from official webpage. {{access | date=2014-10-25}} | |||
* ''$'' [https://www.emeditor.com/text-editor-features/more-features/batch-replace/ EmEditor (Text Editor) - Batch Replace] & [https://zh-tw.emeditor.com/text-editor-features/coding/regular-expressions/ EmEditor (文字編輯器) | 規則運算式] | |||
one regular expression operations on '''multiple''' files | |||
* ''$'' [https://www.emeditor.com/text-editor-features/more-features/find-replace/ EmEditor (Text Editor) | Find and Replace] | |||
== syntax == | == syntax == | ||
Line 121: | Line 673: | ||
* tab鍵的固定空白分隔: \t (適用: Notepad++選項: 增強模式) | * tab鍵的固定空白分隔: \t (適用: Notepad++選項: 增強模式) | ||
* 數字: \d (適用: Notepad++選項: 用類型表式。{{exclaim}} 不適用: Notepad++選項: 增強模式) | * 數字: \d (適用: Notepad++選項: 用類型表式。{{exclaim}} 不適用: Notepad++選項: 增強模式) | ||
* {{kbd | key=<nowiki>\S</nowiki>}} 非空白的文字: 不會含括半形空白與全行空白 | |||
== Troubleshooting of regular expression == | |||
Tips | |||
* Use online tool [https://regex101.com/ regex101: build, test, and debug regex] to obtain the explain of your syntax | |||
* Small data test: (1) Prepare the small file data to verify the syntax (2) Using the [[Regular_expression#Regular_expression_online_tools | online tools]] | |||
* Highlight or output the matched text e.g. {{kbd | key=<nowiki>--color</nowiki>}}<ref>[https://www.cyberciti.biz/faq/howto-use-grep-command-in-linux-unix/grep_command_examples/ Grep -color command Examples - nixCraft]</ref> for grep command or output the matches by PHP [http://php.net/manual/en/function.preg-match.php preg_match()] function. | |||
* Simplify the syntax | |||
* Because the compatibility issue, you may try to use the alternative syntax e.g. {{kbd | key=<nowiki>\d</nowiki>}} to {{kbd | key=<nowiki>[0-9]+</nowiki>}}. | |||
Related articles | |||
* [ | * [https://errerrors.blogspot.com/2015/07/sublime-text-invalid-lookbehind.html Err: 解決 Sublime Text 正則表示式搜尋,遇到的「Invalid lookbehind assertion」錯誤] | ||
== further reading == | == further reading == | ||
* [http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Searching_And_Replacing SourceForge.net: Searching And Replacing - notepad-plus], [http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Regular_Expressions SourceForge.net: Regular Expressions - notepad-plus] | * [http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Searching_And_Replacing SourceForge.net: Searching And Replacing - notepad-plus], [http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Regular_Expressions SourceForge.net: Regular Expressions - notepad-plus] | ||
* [http://stackoverflow.com/questions/23020856/text-extraction-with-sublime-text regex - text extraction with sublime text - Stack Overflow] {{access | date=2014-09-26}} | |||
* [https://zh.wikipedia.org/wiki/%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F 正規表示式 - 維基百科,自由的百科全書] | |||
* [http://www.regular-expressions.info/ Regular-Expressions.info - Regex Tutorial, Examples and Reference - Regexp Patterns] | |||
* [http://linux.vbird.org/linux_basic/0320bash.php 鳥哥的 Linux 私房菜 -- 第十章、認識與學習BASH] {{access | date = 2016-06-08}} | |||
* [https://stackoverflow.com/questions/3548453/negative-matching-using-grep-match-lines-that-do-not-contain-foo Negative matching using grep (match lines that do not contain foo) - Stack Overflow] {{access | date = 2018-04-06}} | |||
* [https://support.google.com/a/answer/1371415?hl=zh-Hant 規則運算式的語法 - G Suite 管理員說明] {{access | date = 2018-12-06}} | |||
unicode | unicode | ||
* [http://www.regular-expressions.info/unicode.html Regex Tutorial - Unicode Characters and Properties] {{access | date = 2014-04-02}} | * [http://www.regular-expressions.info/unicode.html Regex Tutorial - Unicode Characters and Properties] {{access | date = 2014-04-02}} | ||
Line 146: | Line 709: | ||
* Copy to MS Excel 2002 from Google Docs: ok | * Copy to MS Excel 2002 from Google Docs: ok | ||
[[Category: | |||
{{Template:Troubleshooting}} | |||
[[Category:Regular expression]] [[Category:Software]] [[Category:Programming]] [[Category:Data Science]] [[Category:Search]] [[Category:String manipulation]] |
Latest revision as of 11:18, 29 February 2024
透過正規表示法 (Regular Expression) 處理文字檔時,可以快速地搜尋或取代符合特定規則的字串。以每行為單位,進行字串處理[1]。 正規表示法 又稱正規表示式、正規表達式、正則表達式、正規表示法、正規運算式、規則運算式、常規表示法[2]。
有問題嗎?可以利用提供解說的 線上工具,嘗試自己除錯。 也可以到看板 RegExp 文章列表 - 批踢踢實業坊或其他問答服務詢問。
快速查表[edit]
說明: (1) sample 藍色網底處代表符合規則的文字、(2) 同一文字規則可以有多種表示法
文字規則 | sample | 對立的文字規則 | sample |
---|---|---|---|
任意一個文字(包含空白,但不包含換行符號) . |
What Does the Fox Say? 12 狐狸怎叫 34 | ||
任意文字(包含空白),出現1次或0次 .? = .{0,1} |
What Does the Fox Say? 12 狐狸怎叫 34 | ||
任意次的多個文字(包含空白) .* = .{0,} |
What Does the Fox Say? 12 狐狸怎叫 34 | ||
任意次的文字(包含空白),至少出現1次 .+ = .{1,} |
What Does the Fox Say? 12 狐狸怎叫 34 | ||
任意次的空白或換行符號 (至少出現1次的空白或換行符號) \s+ |
What Does the Fox Say? 12 狐狸怎叫 34 | 任意多個文字(不包含空白或換行符號) [^\s]+ = [^\s]{1,} = [\S]+ = [^ ]+ |
What Does the Fox Say? 12 狐狸怎叫 34 |
任意次的 ASCII character (包含英文、數字和空白) demo[3] [\x00-\x80]+ 或 [[:ascii:]]+[4] |
What Does the Fox Say? 12 狐狸怎叫 34 | 非 ASCII,即中文出現任意次 [^\x00-\x80]+ |
What Does the Fox Say? 12 狐狸怎叫 34 |
任意次的大小寫英文、數字和底線符號( _ ) (不包含空白) (demo) [\w]+ = [a-zA-Z0-9_]+ PHP 加上 u 修飾語,則可支援中文字 |
What Does the Fox Say? 12 狐狸怎叫 _34 | 任意次的不是英文、數字和底線符號( _ )的文字 \W+ = [^a-zA-Z0-9_]+ |
demo |
任意次的數字(不包含空白) [\d]+ = [0-9]+ |
What Does the Fox Say? 12 狐狸怎叫 34 | 不包含數字的任意次文字(包含空白 [^\d]+ = [^0-9]+ = \D+ |
What Does the Fox Say? 12 狐狸怎叫 34 |
任意次的中文字 [\p{Han}]+ (demo、 詳細說明) |
What Does the Fox Say? 12 狐狸怎叫 34 | 不包含中文字的任意次文字 [^\p{Han}]+ (demo) |
|
以「狐狸」開頭的行 ^狐狸.*$[5] |
狐狸怎叫 34 What Does the Fox Say? |
不以「狐狸」開頭的行 ^(?!狐狸).*$[6] |
狐狸怎叫 34 What Does the Fox Say? |
以「怎叫」結尾的行 ^.*怎叫$ |
What Does the Fox Say? 12 狐狸怎叫 34 |
不以「怎叫」結尾的行 .*(?<!怎叫)$[7] |
What Does the Fox Say? 12 狐狸怎叫 34 |
包含「狐狸」的行 ^.*狐狸.*$ 或 (狐狸) (demo) |
What Does the Fox Say? 12 狐狸怎叫 34 |
不包含「狐狸」的行 (demo) ^((?!狐狸).)*$ |
What Does the Fox Say? 12 狐狸怎叫 34 |
布林邏輯 AND: 包含「狐狸」和「叫」的行 (demo)[8] (?=.*狐狸)(?=.*叫).* 或 狐狸.*叫|叫.*狐狸 |
What Does the Fox Say? 12 狐狸怎叫 34 |
||
布林邏輯 OR: 包含「狐狸」或「叫」的行 (demo) .*(狐狸|叫).* |
What Does the Fox Say? 12 狐狸怎叫 34 |
布林邏輯: 不包含「狐狸」也不包含「柴犬」的行 ^((?!狐狸|柴犬).)*$ |
What Does the Fox Say? 12 狐狸怎叫 34 What Does the shiba inu say? 柴犬怎叫 |
布林邏輯 NOT: 不包含「狐狸」,但包含「柴犬」的行 (demo)[9] ^((?!狐狸).)*(柴犬).*$ = ^(柴犬).*((?!狐狸).)*$ = (柴犬).*((?!狐狸).)* (如果句子同時存在狐狸和柴犬會出錯) |
What Does the Fox Say? 12 狐狸怎叫 34 |
Regular expression online tools[edit]
測試 Regular expression 語法的網站
- RegEx101 "Online regex tester and debugger: PHP, PCRE, Python, Golang and JavaScript" (example) 有提供語法解說。教學: RegEx101正規表示法線上產生器,有沒有選到立馬告訴你|梅問題.教學網
- RegExr: Learn, Build, & Test RegEx (example). 有提供語法解說. 教學: RegExr: 功能強大的正規式撰寫協助工具
- Regexper: 圖解方式提供語法解說 e.g. \d{3}(.*)
- Regulex:JavaScript Regular Expression Visualizer : 圖解方式提供語法解說 e.g. ^(a|b)*?$
- Rubular: a Ruby regular expression editor and tester (example)
- PHP Live Regex [Last visited: 2014-11-25]
- Regex Tester and Debugger Online - Javascript, PCRE, PHP [Last visited: 2016-01-07]
- Regular Expression (RegExp) in JavaScript - 石頭閒語 [Last visited: 2017-11-14]
Examples
- Regular Expression Library 網友提供的 pattern 範例
cases[edit]
取代換行符號為逗號[edit]
將Email清單,轉成Email軟體可以使用的寄信名單
原 [email protected] [email protected] [email protected] 改成 [email protected],[email protected],[email protected]
方案1: Sublime Text, EmEditor[edit]
語法適用 Sublime Text, EmEditor軟體 (以下為 EmEditor 的操作說明)
- Menu: Search -> Replace
- click "Use Regular Expression"
- Find: \n ( 換行符號 。Win 作業系統的換行符號是 \r\n、Mac 作業系統的換行符號是 \n,取兩者共有的符號。如果使用 Linux 作業系統的換行符號是 \r。 )
- Replace with: ,
- click "Replace all"
將每行的文字,移除換行,並且都加上逗號分隔[edit]
// before Elmo Emie Granny Bird // after Elmo, Emie, Granny Bird
方法: 使用 Sublime Text 或 EmEditor。
- Find what: \n
- Replace with: , 此例是將每行的文字,都加上逗號+空格分隔 (如果要用別的符號分隔,例如頓號分隔,則是 Replace with: 、)
將逗號分隔的文字,還原成逐行顯示,並且移除分隔符號 (,)[edit]
// before Elmo, Emie, Granny Bird // after Elmo Emie Granny Bird
方法: 使用 Sublime Text 或 EmEditor。 輸出結果的每行前面可能會有空白
- Find what: ([^,]+),
- Replace with: \1\n
方案2: Notepad++[edit]
使用Notepad++軟體
- 選單: 尋找 -> 取代
- 搜尋模式: 勾選「增強模式」 (不是勾選「用類型表式」)
- 尋找目標: \n (換行符號)
- 取代成: ,
- 勾選全部取代
相關資料: How To Replace Line Ends, thus changing the line layout last visited: 2010-01-27
方案3: Microsoft Word[edit]
使用Microsoft Word 2002軟體
- 選單: 編輯 -> 取代
- 勾選增強模式
- 尋找目標: ^p (段落標記)
- 取代為: ,
- 勾選全部取代
方案4: Sed command for linux[edit]
sed 's/要被取代的字串/新的字串/g' old.filename > new.filename[10]
(1)要被取代的字串: :a;N;$!ba;s/\n (2)新的字串: ;
sed ':a;N;$!ba;s/\n/; /g' old.filename > new.filename [11]
方案5: 使用支援十六進位編輯 (HEX) 的編輯軟體[edit]
使用支援十六進位編輯 (HEX) 的編輯軟體,例如: iHex - Hex Editor for Mac
- 選單 Edit -> Find
- Find: 0A 換行符號
- Replace: 2c 20 其中 2c 是逗號, 20 是空白
- 儲存檔案
相關資料
Find IP address (IPv4)[edit]
適用 Notepad++ 軟體 v.5.9.5
- 選單: 尋找 -> 取代
- 搜尋模式: 勾選「用類型表式」
- 尋找目標: \d\d?\d?\.\d\d?\d?\.\d\d?\d?\.\d\d?\d?
note: not support {n} syntax
適用 Sublime Text v. 3.2.21
- Find: (?:\d{1,3}\.){3}\d{1,3}
參考資料:
- How to Find or Validate an IP Address [Last visited: 2019-06-05]
- SourceForge.net: Notepad++: Regular expression for IP addresses
- regex - Regular expression that matches valid IPv6 addresses - Stack Overflow [Last visited: 2015-08-10]
移除記事本純文字檔的黑色方塊(UNIX系統的換行符號 LF )[edit]
使用notepad++軟體
- 選單: 尋找 -> 取代
- 搜尋模式: 勾選「增強模式」
- 尋找目標: \n\n (註: 2個LF )
- 取代成: \r\n (註: CR與LF )
用記事本打開純文字檔時,就不會看到黑色方塊
將每項元素,加上引號框起來[edit]
將陣列的每項元素,都加上引號框起來[edit]
Elmo, Emie, Granny Bird, Herry Monster, 喀喀獸 修改成 'Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸'
方法1: 使用 PHP 如果元素包含換行符號,不能用下面方法處理。
$users = array('Elmo', 'Emie', 'Granny Bird', 'Herry Monster', '喀喀獸'); //「單引號」相隔每個元素 $result = implode(",", preg_replace('/^(.*?)$/', "'$1'", $users)); //「雙引號」相隔每個元素 $result = implode(",", preg_replace('/^(.*?)$/', "\"$1\"", $users)); echo $result;
Thanks, Joshua! More on PHP - Wrap Implode Array Elements in Quotes » Me Like Dev
方法2: 使用 Sublime Text 或 EmEditor
- Find: ([^\s|,]+)
- 分隔符號
- 「單引號」相隔每個元素 Replace with: '\1'
- 「雙引號」相隔每個元素 Replace with: "\1"
方法3: 使用 Notepad++。啟用搜尋模式的「用類型表式」
- Find: ([^\s|,]+)
- 分隔符號
- 「單引號」相隔每個元素 Replace with: '$1'
- 「雙引號」相隔每個元素 Replace with: "$1"
將每行的文字,都加上引號框起來,並且移除換行[edit]
// before Elmo Emie Granny Bird // after 'Elmo', 'Emie', 'Granny Bird'
方法1: 使用 Sublime Text 、Notepad++ 或 EmEditor。該方法有處理每行的前面或後面可能有一格或多格空白
如果使用 Mac 作業系統
- Find what: (\S+)(\s?)+$\n
- Replace with: '\1',
(如果要使用雙引號框起來,則是 Replace with: "\1", )
如果使用 Win 作業系統,需要修改換行符號 \n 為 \r\n
- Find what: (\S+)(\s?)+$\r\n on Mac
- Replace with: '\1',
(如果要使用雙引號框起來,則是 Replace with: "\1", )
方法2: 使用 Sublime Text 或 EmEditor 該方法沒有處理每行的後面可能有一格或多格空白
- Find what: (.*)$\n 或 (\S+)$\n 或 (\S+)\n
- Replace with: '\1',
More details on the page add quotation at the start and end of each line.
將引號框起來的文字,還原成逐行顯示,並且移除分隔符號 (,)[edit]
// before 'Elmo', 'Emie', 'Granny Bird' // after Elmo Emie Granny Bird
方法: 使用 Sublime Text 或 EmEditor。該方法有處理每行的前面或後面可能有一格或多格空白
- Find what: '(([^,|^'])+)',?\s?
- Replace with: \1\n
將試算表欄位值前後,加上雙引號框起來[edit]
Find non-ASCII characters 尋找中文、非英文的文字[edit]
Find non-ASCII characters in Google sheet[edit]
適用: Google Drive 試算表的 Regular expression 相關函數,例如: REGEXMATCH、REGEXEXTRACT、RegExReplace 函數、Notepad++的搜尋
[^\x00-\x80]+
Find non-ASCII characters in LibreOffice[edit]
適用: LibreOffice REGEX function[12]、Total commander 的 Multi-Rename tool[13][14]
[^\u0000-\u0080]+
Find Chinese characters in Google sheet[edit]
範例:如果 A2 包含任一中文字,則欄位值顯示「中文」。如果未包含任何中文字,則欄位值顯示「英文」:
=IF(REGEXMATCH(A2, "[\一-\龥]"), "中文", "英文")
Google 不支援以下語法,會顯示「... 是無效的規則運算式。」錯誤
- [\u4e00-\u9fa5]
- [^\u4e00-\u9fa5]
- [\p{Script=Hans}]
- [\p{Han}]
Find Chinese characters in MySQL[edit]
尋找 `column_name` 欄位值包含中文字。適用: MySQL[15][16]
SELECT `column_name` FROM `table_name` WHERE HEX(`column_name`) REGEXP '^(..)*(E[4-9])';
說明
- 正則表達式 '^(..)*(E[4-9])' 的含義是尋找從字符串開始處(表示為 ^),每兩個字符(表示為 ..)重複零次或多次(表示為 *),直到找到一個匹配 (E[4-9]) 的序列。
- 透過加入 ^(..)* 使得搜尋條件更加嚴格,它要求 (E[4-9]) 的出現位置必須是在一個合法的 UTF-8 字符邊界上。這意味著它更可能正確匹配開頭為中文字符的字符串,而忽略那些僅在中間或末尾偶然包含 E4 到 E9 序列的非中文字符串。
Find non-ASCII characters in MySQL[edit]
尋找 `column_name` 欄位值不完全是 ASCII 字元
SELECT `column_name` FROM `table_name` WHERE `column_name` <> CONVERT(`column_name` USING ASCII)
Find non-ASCII characters in PHP[edit]
尋找欄位值包含中文字,中文字包含繁體中文與簡體中文,不包含標點符號 (例如 ,)、全形標點符號 (例如 ,)以及特殊符號,例如 Emoji:⭐。 PHP: exact match
// approach 1 if (preg_match('/^[\x{4e00}-\x{9fa5}]+$/u', $string)) { echo "全部文字都是中文字" . PHP_EOL; }else{ echo "部分文字不是中文字" . PHP_EOL; } // approach 2 if (preg_match('/^[\p{Han}]+$/u', $string)) { echo "全部文字都是中文字" . PHP_EOL; }else{ echo "部分文字不是中文字" . PHP_EOL; }
partial match (online demo hosted by PHP Sandbox)
// approach 1 $string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐'; $pattern = '/[\p{Han}]+/u'; preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE); var_dump($matches); // approach 2 $string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐'; $pattern = '/[\x{4e00}-\x{9fa5}]+/u'; preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE); var_dump($matches);
技術問題除錯: 錯誤訊息
preg_match(): Compilation failed: character value in \x{} or \o{} is too large at offset 8
解決方式: preg_match() 需要加上 u 變數[17]。
Find non-ASCII characters in JavaScript[edit]
參考資料:
- unicode编码 \u4e00-\u9fa5 匹配所有中文 - CSDN博客
- php - CodeIgniter Form Validation for Chinese Words - Stack Overflow
- 中日韓統一表意文字列表 - 維基百科,自由的百科全書
尋找英文字[edit]
尋找 ASCII 字元 in MySQL[edit]
-- 尋找欄位 `my_column` 欄位值是 ASCII 字元 SELECT * FROM `my_table` WHERE `my_column` LIKE CONVERT(`my_column` USING ASCII)
相關文章
參考資料
尋找英文字、數字、破折號(-)或底線(_)字元 in MySQL[edit]
-- 尋找欄位 `my_column` 欄位值是包含英文字、數字、破折號(-)或底線(_)的字串 SELECT * FROM `my_table` WHERE `my_column` REGEXP '[a-zA-Z0-9\-_]'
將每行文字的行頭加上逗號符號[edit]
Adding characters to document lines
知道前面跟後面的文字,但是中間文字忘記了[edit]
使用notepad++軟體
- 選單: 尋找 -> 取代
- 搜尋模式: 勾選「用類型表示」
- 尋找目標: a(.*)le 就可以找到(1)apple (2)apps lesson ... 等a開頭、le結尾的文字,中間可夾雜空白。 中文字串搜尋,建議將文件的編碼改成 UTF-8 編碼
移除空白行[edit]
# (原) 每行可能間隔一行空白或多行空白 尼歐 崔妮蒂 莫斐斯 史密斯 祭師 # (後) 改成每行逐行緊接著 尼歐 崔妮蒂 莫斐斯 史密斯 祭師
移除一行空白或多行空白( 行內可能包含一個或多個空白字元 SPACE 、定位鍵TAB)
- 使用工具: 適用 Sublime Text 與 EmEditor 軟體,需勾選「使用規則運算式」。 以下語法不適用於 Notepad++ 軟體[18]
- 尋找: ^[\s\t]*$\n --> 取代為: 空 (不需要輸入任何字)
- 使用工具: Notepad++ v7.8.7
- Notepad++ 軟體選單: 編輯 -> 行處理 -> 移除空行(包括只有空白字元的行)[19]
- 詳細說明,請見 Regular replace blank lines
尋找非空白的文字[edit]
- 尋找: [^\s]+ online demo
- 解決遇到空白段落發生程式異常錯誤而執行中斷的問題 「... 看起來空白的字元,卻無法使用 TRIM 函數去除,可能是其他的空白字元。解決方式是偵測段落內有沒有包含中英文、數字,再進行後續處理。」
去除標點符號、特殊符號等[edit]
將特定符號相隔的文字,改成逐行顯示[edit]
例子:
# (原) 頓號(、)符號相隔的文字 尼歐、莫斐斯、崔妮蒂、史密斯、祭師 # (後) 改成逐行顯示 尼歐 莫斐斯 崔妮蒂 史密斯 祭師
使用 Sublime Text 或 EmEditor
- Find: ([^、]+)([、]{1})
- Replace with: \1\n
語法說明
- [^、] : 符合任意字,但不是頓號(、)的文字
- [^、]+ : 一次以上不是頓號(、)的文字
- ([^、]+) : 符合「一次以上不是頓號(、)的文字」規則的文字
- [、]: 出現頓號(、)任意次的文字
- [、]{1} : 出現頓號(、)一次的文字
- ([、]{1}) : 符合「出現頓號(、)一次的文字」規則的文字
將每行文字的結尾處,加入空一格 (半形空白)[edit]
法1: 適用軟體: Sublime Text, EmEditor
- Menu: Search -> Replace
- click "Use Regular Expression"
- Find: \n
- Replace with: _\n(符號 \n 前面的 _ 自行替換成半形空白)
- click "Replace all"
法2: 適用軟體: Sublime Text, EmEditor
- Menu: Search -> Replace
- click "Use Regular Expression"
- Find: $
- Replace with: _$(符號 $ 前面的 _ 自行替換成半形空白)
- click "Replace all"
需要檢查最後一行是否是空白行,如果不是空白行,不會套用到該取代規則
將每行文字內夾雜的空白,取代成 Tab 符號[edit]
將原本空白間隔的欄位值,取代成 Tab鍵間隔的欄位值。輸出結果可以方便貼到 MS Excel 或 Google spreadsheet。
# \t 代表是 Tab 鍵,又稱定位鍵 # before aaa bbb ccc # after aaa\tbbb\tccc
說明: \S 代表非空白字元, \r\n 代表 換行符號。[^\S\r\n] 則代表不是非空白字元、也不是換行符號。換句話說尋找空白,但不包含換行符號。
使用 Sublime Text 軟體 (參考資料[20] [21])
- Menu: Search -> Replace
- click "Use Regular Expression"
- Find: ([^\S\n]+) 或 ([^\S\r\n]+) 或 \s\s+ 或 _{1,} ( 自行替換 _ 成半形空白) 因為 \s 包含了空白與換行字元,所以不能直接使用 \s+ 當做搜尋條件
- Replace with: \t
- click "Replace all"
移除每行文字前後面可能多個的空白[edit]
移除每行文字最前面可能多個的空白[edit]
- 尋找: ^\s+ --> 取代為: 空白 (適用軟體: Sublime Text、EmEditor,需啟用 "Use Regular Expression" )
# before aaa bbb ccc # after aaa bbb ccc
移除每行文字最後面可能多個的空白[edit]
- 尋找: \s+$ --> 取代為: 空白 (適用軟體: Sublime Text、EmEditor,需啟用 "Use Regular Expression" )
移除每行文字前面或後面可能多個的空白[edit]
- 尋找: (^\s+|\s+$) --> 取代為: 空白 (適用軟體: Sublime Text、EmEditor,需啟用 "Use Regular Expression" )
尋找包含不是數字,是文字的行[edit]
預期每行資料都是數字,尋找包含不是數字,是文字的行
[^\d|\n]
尋找 Hashtag[edit]
Extract all hashtags from text
尋找文章內容中的網址[edit]
尋找數字[edit]
移除刮號內的文字[edit]
請參考 Remove text within brackets
Search unmatched string[edit]
find un-commented console.log:
original format: some lines contains un-commented Javascript debug information
console.log("un-commented debug information"); //console.log("commented debug information");
Search pattern: find not started with the / symbol before the string "console.log"
[^/](console\.log)
Text editor with support for regular expression[edit]
Text editor with support for regular expression
Regular expression batch tools[edit]
multiple regular expression operations on the same file
- RegReplace 執行多個取代命令 "Simple find and replace sequencer plugin for Sublime Text" Quoted from official webpage. [Last visited: 2014-10-25]
- $ EmEditor (Text Editor) - Batch Replace & EmEditor (文字編輯器) | 規則運算式
one regular expression operations on multiple files
syntax[edit]
- 換行符號: \r\n (適用: Notepad++選項: 增強模式 & 用類型表式)
- tab鍵的固定空白分隔: \t (適用: Notepad++選項: 增強模式)
- 數字: \d (適用: Notepad++選項: 用類型表式。 不適用: Notepad++選項: 增強模式)
- \S 非空白的文字: 不會含括半形空白與全行空白
Troubleshooting of regular expression[edit]
Tips
- Use online tool regex101: build, test, and debug regex to obtain the explain of your syntax
- Small data test: (1) Prepare the small file data to verify the syntax (2) Using the online tools
- Highlight or output the matched text e.g. --color[22] for grep command or output the matches by PHP preg_match() function.
- Simplify the syntax
- Because the compatibility issue, you may try to use the alternative syntax e.g. \d to [0-9]+.
Related articles
further reading[edit]
- SourceForge.net: Searching And Replacing - notepad-plus, SourceForge.net: Regular Expressions - notepad-plus
- regex - text extraction with sublime text - Stack Overflow [Last visited: 2014-09-26]
- 正規表示式 - 維基百科,自由的百科全書
- Regular-Expressions.info - Regex Tutorial, Examples and Reference - Regexp Patterns
- 鳥哥的 Linux 私房菜 -- 第十章、認識與學習BASH [Last visited: 2016-06-08]
- Negative matching using grep (match lines that do not contain foo) - Stack Overflow [Last visited: 2018-04-06]
- 規則運算式的語法 - G Suite 管理員說明 [Last visited: 2018-12-06]
unicode
- Regex Tutorial - Unicode Characters and Properties [Last visited: 2014-04-02]
- PHP: Unicode character properties - Manual [Last visited: 2014-04-02]
references
- ↑ 鳥哥的 Linux 私房菜 -- 正規表示法 (regular expression, RE) 與文件格式化處理
- ↑ 正規表示式 - 維基百科,自由的百科全書
- ↑ Ascii Table - ASCII character codes and html, octal, hex and decimal chart conversion
- ↑ php - Regex for Any English ASCII Character Including Special Characters - Stack Overflow
- ↑ Regex Examples: Matching Whole Lines of Text That Satisfy Certain Requirements
- ↑ regex - Regular expression to match text that *doesn't* contain a word? - Stack Overflow
- ↑ Regex not ending with - Stack Overflow
- ↑ regex - Regular Expressions: Is there an AND operator? - Stack Overflow
- ↑ regex - Regular expression for a string containing one word but not another - Stack Overflow
- ↑ 鳥哥的 Linux 私房菜 -- 正規表示法 (regular expression, RE) 與文件格式化處理
- ↑ 參考 unix - sed: How can I replace a newline?
- ↑ List of Regular Expressions
- ↑ 取代非英文的文字,但是不包含 . 符號: [^\u0000-\u0080|.]+
- ↑ javascript - Regular expression to match non-english characters? - Stack Overflow
- ↑ How to detect rows with chinese characters in MySQL? - Stack Overflow
- ↑ How can I find non-ASCII characters in MySQL? - Stack Overflow
- ↑ php - preg_match(): Compilation failed: character value in \x{} or \o{} is too large at offset 27 on line number 25 - Stack Overflow
- ↑ Regex: delete multiple blank lines
- ↑ regex - Removing empty lines in Notepad++ - Stack Overflow
- ↑ Quickly replace multiple space characters with a tab character - TechRepublic
- ↑ regex - Match whitespace but not newlines (Perl) - Stack Overflow
- ↑ Grep -color command Examples - nixCraft
替代方案[edit]
- 將資料以 Tab來隔開,貼到Google Drive的Spreadsheet或MS Excel,會自動儲存到不同欄位。所以將需要處理的原始資料中,需要擷取的資料的前後,使用Tab來隔開,複製後貼到於Google Drive的Spreadsheet或MS Excel,就會自動儲存到不同欄位,方便做進一步處理。
Copy multiple rows & paste
- Copy to dreamweaver from MS Excel 2002: ok
- Copy to dreamweaver from Google Docs: not ok
- Copy to MS Excel 2002 from Google Docs: ok
Troubleshooting of ...
- PHP, cUrl, Python, selenium, HTTP status code errors
- Database: SQL syntax debug, MySQL errors, MySQLTuner errors or PostgreSQL errors
- HTML/Javascript: Troubleshooting of javascript, XPath
- Software: Mediawiki, Docker, FTP problems, online conference software
- Test connectivity for the web service, Web Ping, Network problem, Web user behavior, Web scrape troubleshooting
Template