Regular expression: Difference between revisions

Jump to navigation Jump to search
2,287 bytes added ,  24 January 2019
(22 intermediate revisions by the same user not shown)
Line 1: Line 1:
處理文字檔時,可以快速地搜尋或取代符合特定規則的字串。以每行為單位,進行字串處理<ref>[http://linux.vbird.org/linux_basic/0330regularex.php 鳥哥的 Linux 私房菜 -- 正規表示法 (regular expression, RE) 與文件格式化處理]</ref>。 正規表示法 (Regular Expression),又稱正規表示式、正則表達式、正規表示法、正規運算式、規則運算式、常規表示法<ref>[https://zh.wikipedia.org/wiki/%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F 正規表示式 - 維基百科,自由的百科全書]</ref>。
透過正規表示法 (Regular Expression) 處理文字檔時,可以快速地搜尋或取代符合特定規則的字串。以每行為單位,進行字串處理<ref>[http://linux.vbird.org/linux_basic/0330regularex.php 鳥哥的 Linux 私房菜 -- 正規表示法 (regular expression, RE) 與文件格式化處理]</ref>。 正規表示法 又稱正規表示式、正則表達式、正規表示法、正規運算式、規則運算式、常規表示法<ref>[https://zh.wikipedia.org/wiki/%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F 正規表示式 - 維基百科,自由的百科全書]</ref>。


{{Raise hand | text = 有問題嗎?可以利用提供解說的[[Regular_expression#Regular_expression_online_tools | 線上工具]],嘗試自己除錯。 也可以到[http://www.ptt.cc/bbs/RegExp/index.html 看板 RegExp 文章列表 - 批踢踢實業坊]或其他[[問答服務]]詢問。 }}
{{Raise hand | text = 有問題嗎?可以利用提供解說的[[Regular_expression#Regular_expression_online_tools | 線上工具]],嘗試自己除錯。 也可以到[http://www.ptt.cc/bbs/RegExp/index.html 看板 RegExp 文章列表 - 批踢踢實業坊]或其他[[問答服務]]詢問。 }}
Line 130: Line 130:


== Regular expression online tools ==
== Regular expression online tools ==
* [http://regex101.com/ Online regex tester and debugger: JavaScript, Python, PHP, and PCRE] ([http://regex101.com/r/tH1eT7/1 example]) {{Gd}} 有提供語法解說
* {{Gd}} [http://regex101.com/ RegEx101] "Online regex tester and debugger: PHP, PCRE, Python, Golang and JavaScript" ([http://regex101.com/r/tH1eT7/1 example]) 有提供語法解說。教學: [https://www.minwt.com/webdesign-dev/html/20352.html RegEx101正規表示法線上產生器,有沒有選到立馬告訴你|梅問題.教學網]
* {{Gd}} [http://gskinner.com/RegExr/ RegExr]: Learn, Build, & Test RegEx ([http://regexr.com/395t0 example]). 有提供語法解說. 教學: [http://blog.hsdn.net/1426.html RegExr: 功能強大的正規式撰寫協助工具]
* [https://regexper.com/ Regexper]: 圖解方式提供語法解說 e.g. [https://regexper.com/#%5Cd%7B3%7D%28.*%29 \d{3}(.*)]
* [https://jex.im/regulex/ Regulex:JavaScript Regular Expression Visualizer] : 圖解方式提供語法解說 e.g. [https://jex.im/regulex/#!flags=&re=%5E(a%7Cb)*%3F%24 ^(a|b)*?$]
* [http://www.rubular.com/ Rubular]: a Ruby regular expression editor and tester ([http://www.rubular.com/r/UZuUT5pjeh example])
* [http://www.rubular.com/ Rubular]: a Ruby regular expression editor and tester ([http://www.rubular.com/r/UZuUT5pjeh example])
* [http://gskinner.com/RegExr/ RegExr]: Learn, Build, & Test RegEx ([http://regexr.com/395t0 example]). {{Gd}} 有提供語法解說. 教學: [http://blog.hsdn.net/1426.html RegExr: 功能強大的正規式撰寫協助工具]
* [http://www.phpliveregex.com/ PHP Live Regex] {{access | date=2014-11-25}}
* [http://www.phpliveregex.com/ PHP Live Regex] {{access | date=2014-11-25}}
* [http://www.gethifi.com/tools/regex HiFi Regex Tester - Live JavaScript Regular Expression Tester] for Javascript {{access | date=2014-12-23}}
* [http://www.gethifi.com/tools/regex HiFi Regex Tester - Live JavaScript Regular Expression Tester] for Javascript {{access | date=2014-12-23}}
Line 142: Line 144:


== cases ==
== cases ==
=== 將Email清單,轉成Email軟體可以使用的寄信名單 (取代換行符號為逗號) ===
=== 取代換行符號為逗號 ===
將Email清單,轉成Email軟體可以使用的寄信名單
<pre>
<pre>
原  
原  
Line 150: Line 153:


改成
改成
</pre>
</pre>


Line 157: Line 160:
# Menu: Search -> Replace
# Menu: Search -> Replace
# click "Use Regular Expression"
# click "Use Regular Expression"
## Find: {{kbd | key = <nowiki>\n</nowiki>}} ([https://zh.wikipedia.org/wiki/%E6%8F%9B%E8%A1%8C 換行符號] 。{{Win}} 作業系統的換行符號是 {{kbd | key = <nowiki>\r\n</nowiki>}}、{{Mac}} 作業系統的換行符號是 {{kbd | key = <nowiki>\n</nowiki>}},取兩者共有的符號。如果使用 {{Linux}} 作業系統的換行符號是 {{kbd | key = <nowiki>\r</nowiki>}}。 )
## Find: {{kbd | key = <nowiki>\n</nowiki>}} ([[Return symbol | 換行符號]] 。{{Win}} 作業系統的換行符號是 {{kbd | key = <nowiki>\r\n</nowiki>}}、{{Mac}} 作業系統的換行符號是 {{kbd | key = <nowiki>\n</nowiki>}},取兩者共有的符號。如果使用 {{Linux}} 作業系統的換行符號是 {{kbd | key = <nowiki>\r</nowiki>}}。 )
## Replace with: {{kbd | key = <nowiki>, </nowiki>}}
## Replace with: {{kbd | key = <nowiki>, </nowiki>}}
# click "Replace all"
# click "Replace all"
==== 方案2: Notepad++ ====
使用[http://notepad-plus-plus.org/ Notepad++]軟體
# 選單: 尋找 -> 取代
# 搜尋模式: 勾選「增強模式」 (不是勾選「用類型表式」)
## 尋找目標: {{kbd | key = <nowiki>\n</nowiki>}} (換行符號)
## 取代成: {{kbd | key = <nowiki>, </nowiki>}}
# 勾選全部取代
相關資料: [http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Replacing_Newlines How To Replace Line Ends, thus changing the line layout] last visited: 2010-01-27
==== 方案3: Microsoft Word ====
使用Microsoft Word 2002軟體
# 選單: 編輯 -> 取代
# 勾選增強模式
## 尋找目標: {{kbd | key = <nowiki>^p</nowiki>}} (段落標記)
## 取代為: {{kbd | key = <nowiki>, </nowiki>}}
# 勾選全部取代
==== 方案4: Sed command for linux ====
{{kbd | key=<nowiki>sed 's/要被取代的字串/新的字串/g' old.filename > new.filename</nowiki>}}<ref>[http://linux.vbird.org/linux_basic/0330regularex.php#sed_replace 鳥哥的 Linux 私房菜 -- 正規表示法 (regular expression, RE) 與文件格式化處理]</ref>
(1)要被取代的字串: :a;N;$!ba;s/\n
(2)新的字串: ;
{{kbd | key=<nowiki>sed ':a;N;$!ba;s/\n/; /g' old.filename > new.filename</nowiki>}} <ref>參考 [http://stackoverflow.com/questions/1251999/sed-how-can-i-replace-a-newline-n unix - sed: How can I replace a newline? ]</ref>




<div style="float: left; width: 100%; position: relative; display: block; clear: left;">
<div style="float: left; width: 100%; position: relative; display: block; clear: left;">
<div style="width: 46%;  float: left; margin:0 auto; position: relative; display: block; ">
<div style="width: 46%;  float: left; margin:0 auto; position: relative; display: block; ">
==== 將每行的文字,移除換行,並且都加上逗號分隔 ====
===== 將每行的文字,移除換行,並且都加上逗號分隔 =====
<pre>
<pre>
// before
// before
Line 211: Line 186:
<div style="width: 46%; float: left; margin:0 auto; position: absolute; display: block; left: 54%; top: 0;">
<div style="width: 46%; float: left; margin:0 auto; position: absolute; display: block; left: 54%; top: 0;">


==== 將逗號分隔的文字,還原成逐行顯示,並且移除分隔符號 (,) ====
===== 將逗號分隔的文字,還原成逐行顯示,並且移除分隔符號 (,) =====
<pre>
<pre>
// before
// before
Line 227: Line 202:
</div>
</div>
</div>
</div>
<div style="clear:both;">&nbsp;</div>
==== 方案2: Notepad++ ====
使用[http://notepad-plus-plus.org/ Notepad++]軟體
# 選單: 尋找 -> 取代
# 搜尋模式: 勾選「增強模式」 (不是勾選「用類型表式」)
## 尋找目標: {{kbd | key = <nowiki>\n</nowiki>}} (換行符號)
## 取代成: {{kbd | key = <nowiki>, </nowiki>}}
# 勾選全部取代
相關資料: [http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Replacing_Newlines How To Replace Line Ends, thus changing the line layout] last visited: 2010-01-27
==== 方案3: Microsoft Word ====
使用Microsoft Word 2002軟體
# 選單: 編輯 -> 取代
# 勾選增強模式
## 尋找目標: {{kbd | key = <nowiki>^p</nowiki>}} (段落標記)
## 取代為: {{kbd | key = <nowiki>, </nowiki>}}
# 勾選全部取代
==== 方案4: Sed command for linux ====
{{kbd | key=<nowiki>sed 's/要被取代的字串/新的字串/g' old.filename > new.filename</nowiki>}}<ref>[http://linux.vbird.org/linux_basic/0330regularex.php#sed_replace 鳥哥的 Linux 私房菜 -- 正規表示法 (regular expression, RE) 與文件格式化處理]</ref>


<div style="clear:both;">&nbsp;</div>
(1)要被取代的字串: :a;N;$!ba;s/\n
(2)新的字串: ;
 
{{kbd | key=<nowiki>sed ':a;N;$!ba;s/\n/; /g' old.filename > new.filename</nowiki>}} <ref>參考 [http://stackoverflow.com/questions/1251999/sed-how-can-i-replace-a-newline-n unix - sed: How can I replace a newline? ]</ref>
 
==== 方案5: 使用支援十六進位編輯 (HEX) 的編輯軟體 ====
 
使用支援十六進位編輯 (HEX) 的編輯軟體,例如: [https://itunes.apple.com/tw/app/ihex-hex-editor/id909566003?mt=12 ‎iHex - Hex Editor] for {{Mac}}
# 選單 Edit -> Find
# Find: {{kbd | key=<nowiki>0A</nowiki>}} 換行符號
# Replace: {{kbd | key=<nowiki>2c 20</nowiki>}} 其中 2c 是逗號, 20 是空白
# 儲存檔案
 
相關資料
 
* [https://www.hexdictionary.com/ Hex Dictionary | Convert Hex / Hexadecimal Numbers to Binary and Decimal]


=== Find IP address ===
=== Find IP address ===
Line 301: Line 314:
方法1: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor]。該方法有處理每行的前面或後面可能有一格或多格空白
方法1: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor]。該方法有處理每行的前面或後面可能有一格或多格空白
* Find what: {{kbd | key = <nowiki>(\S+)(\s?)+$\n</nowiki>}}
* Find what: {{kbd | key = <nowiki>(\S+)(\s?)+$\n</nowiki>}}
* Replace with: {{kbd | key = <nowiki>'\1', </nowiki>}}
* Replace with: {{kbd | key = <nowiki>'\1', </nowiki>}} <br />(如果要使用雙引號框起來,則是 Replace with: {{kbd | key = <nowiki>"\1", </nowiki>}})


方法2: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor] {{exclaim}} 該方法沒有處理每行的後面可能有一格或多格空白
方法2: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor] {{exclaim}} 該方法沒有處理每行的後面可能有一格或多格空白
Line 331: Line 344:
<div style="clear:both;">&nbsp;</div>
<div style="clear:both;">&nbsp;</div>


=== 尋找非英文的文字 ===
=== 尋找中文、非英文的文字 ===
適用: Google Drive RegExReplace 函數、Notepad++的搜尋
適用: Google Drive 試算表的 [https://support.google.com/docs/answer/3098245?hl=zh-Hant RegExReplace] 函數、Notepad++的搜尋
<pre>
<pre>
[^\x00-\x80]+
[^\x00-\x80]+
</pre>
</pre>


適用: Total commander 的 Multi-Rename tool<ref>取代非英文的文字,但是不包含 . 符號: <nowiki>[^\u0000-\u0080|.]+ </nowiki></ref>
適用: Total commander 的 Multi-Rename tool<ref>取代非英文的文字,但是不包含 . 符號: <nowiki>[^\u0000-\u0080|.]+ </nowiki></ref><ref>[http://stackoverflow.com/questions/150033/regular-expression-to-match-non-english-characters javascript - Regular expression to match non-english characters? - Stack Overflow]</ref>
<pre>
<pre>
[^\u0000-\u0080]+
[^\u0000-\u0080]+
</pre>
</pre>


適用: MySQL<ref>[https://stackoverflow.com/questions/9795137/how-to-detect-rows-with-chinese-characters-in-mysql How to detect rows with chinese characters in MySQL? - Stack Overflow]</ref>
尋找欄位值包含中文字,適用: MySQL<ref>[https://stackoverflow.com/questions/9795137/how-to-detect-rows-with-chinese-characters-in-mysql How to detect rows with chinese characters in MySQL? - Stack Overflow]</ref>
<pre>
<pre>
# Table `table_name` contains the Chinese characters
SELECT `column_name`
SELECT `column_name`
FROM `table_name`
FROM `table_name`
Line 350: Line 362:
</pre>
</pre>


參考資料:
尋找欄位值包含中文字,中文字包含繁體中文與簡體中文,不包含特殊符號,例如 Emoji:{{kbd | key = ⭐}}。
* [http://stackoverflow.com/questions/150033/regular-expression-to-match-non-english-characters javascript - Regular expression to match non-english characters? - Stack Overflow]
 
=== 尋找中文的文字 ===
中文字包含繁體中文與簡體中文,不包含特殊符號,例如:{{kbd | key = ⭐}}。
PHP:
PHP:
<pre>
<pre>
Line 530: Line 538:
[[Extract large number from text]]
[[Extract large number from text]]


== Search unmatched string ==
=== Search unmatched string ===
=== case: find un-commented console.log ===
find un-commented console.log:
 
original format: some lines contains un-commented [[Javascript debug]] information
original format: some lines contains un-commented [[Javascript debug]] information
<pre>
<pre>
Line 559: Line 568:
* {{kbd | key=<nowiki>\S</nowiki>}} 非空白的文字: 不會含括半形空白與全行空白
* {{kbd | key=<nowiki>\S</nowiki>}} 非空白的文字: 不會含括半形空白與全行空白


== trouble shooting ==
== Troubleshooting of regular expression ==
* [http://errerrors.blogspot.com/2015/07/sublime-text-invalid-lookbehind.html Err: 解決 Sublime Text 正則表示式搜尋,遇到的「Invalid lookbehind assertion」錯誤]
Tips
* Small data test: (1) Prepare the small file data to verify the syntax (2) Using the [[Regular_expression#Regular_expression_online_tools | online tools]]
* Highlight or output the matched text e.g. {{kbd | key=<nowiki>--color</nowiki>}}<ref>[https://www.cyberciti.biz/faq/howto-use-grep-command-in-linux-unix/grep_command_examples/ Grep -color command Examples - nixCraft]</ref> for grep command or output the matches by PHP [http://php.net/manual/en/function.preg-match.php preg_match()] function.
* Simplify the syntax
* Because the compatibility issue, you may try to use the alternative syntax e.g. {{kbd | key=<nowiki>\d</nowiki>}} to {{kbd | key=<nowiki>[0-9]+</nowiki>}}.
 
Related articles
* [https://errerrors.blogspot.com/2015/07/sublime-text-invalid-lookbehind.html Err: 解決 Sublime Text 正則表示式搜尋,遇到的「Invalid lookbehind assertion」錯誤]


== further reading ==
== further reading ==
Line 568: Line 584:
* [http://www.regular-expressions.info/ Regular-Expressions.info - Regex Tutorial, Examples and Reference - Regexp Patterns]
* [http://www.regular-expressions.info/ Regular-Expressions.info - Regex Tutorial, Examples and Reference - Regexp Patterns]
* [http://linux.vbird.org/linux_basic/0320bash.php 鳥哥的 Linux 私房菜 -- 第十章、認識與學習BASH] {{access | date = 2016-06-08}}
* [http://linux.vbird.org/linux_basic/0320bash.php 鳥哥的 Linux 私房菜 -- 第十章、認識與學習BASH] {{access | date = 2016-06-08}}
 
* [https://stackoverflow.com/questions/3548453/negative-matching-using-grep-match-lines-that-do-not-contain-foo Negative matching using grep (match lines that do not contain foo) - Stack Overflow] {{access | date = 2018-04-06}}
* [https://support.google.com/a/answer/1371415?hl=zh-Hant 規則運算式的語法 - G Suite 管理員說明] {{access | date = 2018-12-06}}
unicode
unicode
* [http://www.regular-expressions.info/unicode.html Regex Tutorial - Unicode Characters and Properties] {{access | date = 2014-04-02}}
* [http://www.regular-expressions.info/unicode.html Regex Tutorial - Unicode Characters and Properties] {{access | date = 2014-04-02}}
Line 583: Line 600:
* Copy to dreamweaver from Google Docs: not ok {{exclaim}}
* Copy to dreamweaver from Google Docs: not ok {{exclaim}}
* Copy to MS Excel 2002 from Google Docs: ok
* Copy to MS Excel 2002 from Google Docs: ok
{{Template:Troubleshooting}}


[[Category:Regular expression]] [[Category:Software]] [[Category:Programming]] [[Category:Data Science]] [[Category:Search]] [[Category:Text file processing]]
[[Category:Regular expression]] [[Category:Software]] [[Category:Programming]] [[Category:Data Science]] [[Category:Search]] [[Category:Text file processing]]

Navigation menu