13,468
edits
Tags: Mobile edit Mobile web edit |
|||
(65 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
透過正規表示法 (Regular Expression) 處理文字檔時,可以快速地搜尋或取代符合特定規則的字串。以每行為單位,進行字串處理<ref>[http://linux.vbird.org/linux_basic/0330regularex.php 鳥哥的 Linux 私房菜 -- 正規表示法 (regular expression, RE) 與文件格式化處理]</ref>。 正規表示法 | 透過正規表示法 (Regular Expression) 處理文字檔時,可以快速地搜尋或取代符合特定規則的字串。以每行為單位,進行字串處理<ref>[http://linux.vbird.org/linux_basic/0330regularex.php 鳥哥的 Linux 私房菜 -- 正規表示法 (regular expression, RE) 與文件格式化處理]</ref>。 正規表示法 又稱正規表示式、正規表達式、正則表達式、正規表示法、正規運算式、規則運算式、常規表示法<ref>[https://zh.wikipedia.org/wiki/%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F 正規表示式 - 維基百科,自由的百科全書]</ref>。 | ||
{{Raise hand | text = 有問題嗎?可以利用提供解說的[[Regular_expression#Regular_expression_online_tools | 線上工具]],嘗試自己除錯。 也可以到[http://www.ptt.cc/bbs/RegExp/index.html 看板 RegExp 文章列表 - 批踢踢實業坊]或其他[[問答服務]]詢問。 }} | {{Raise hand | text = 有問題嗎?可以利用提供解說的[[Regular_expression#Regular_expression_online_tools | 線上工具]],嘗試自己除錯。 也可以到[http://www.ptt.cc/bbs/RegExp/index.html 看板 RegExp 文章列表 - 批踢踢實業坊]或其他[[問答服務]]詢問。 }} | ||
Line 5: | Line 5: | ||
== 快速查表 == | == 快速查表 == | ||
說明: (1) sample 藍色網底處代表符合規則的文字、(2) 同一文字規則可以有多種表示法 | 說明: (1) sample 藍色網底處代表符合規則的文字、(2) 同一文字規則可以有多種表示法 | ||
<table border="1" style="width:100%"> | <table border="1" style="width:100%" class="wikitable"> | ||
<tr > | <tr > | ||
<th style="background-color: #E0E0E0;"> 文字規則 </th> | <th style="background-color: #E0E0E0;"> 文字規則 </th> | ||
Line 43: | Line 43: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td> 任意次的 ASCII character(包含英文、數字和空白) [http://regexr.com/3aom2 demo]<ref>[http://www.asciitable.com/ | <td> 任意次的 ASCII character (包含英文、數字和空白) [http://regexr.com/3aom2 demo]<ref>[http://www.asciitable.com/ Ascii Table - ASCII character codes and html, octal, hex and decimal chart conversion]</ref> <br /> {{kbd | key = <nowiki>[\x00-\x80]+</nowiki>}} 或 {{kbd | key = <nowiki>[[:ascii:]]+</nowiki>}}<ref>[https://stackoverflow.com/questions/24903140/regex-for-any-english-ascii-character-including-special-characters php - Regex for Any English ASCII Character Including Special Characters - Stack Overflow]</ref></td> | ||
<td><span style="background:#C6E3FF">What Does the Fox Say? 12</span> 狐狸怎叫 34</td> | <td><span style="background:#C6E3FF">What Does the Fox Say? 12</span> 狐狸怎叫 34</td> | ||
<td>非 ASCII,即中文出現任意次<br /> {{kbd | key = <nowiki>[^\x00-\x80]+</nowiki>}}</td> | <td>非 ASCII,即中文出現任意次<br /> {{kbd | key = <nowiki>[^\x00-\x80]+</nowiki>}}</td> | ||
Line 49: | Line 49: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td> | <td> 任意次的大小寫英文、數字和底線符號( _ ) (不包含空白) ([https://regex101.com/r/gIKB6a/1 demo])<br /> {{kbd | key = <nowiki>[\w]+</nowiki>}} = {{kbd | key = <nowiki>[a-zA-Z0-9_]+</nowiki>}} <br /> PHP 加上 {{kbd | key =u}} 修飾語,則可支援中文字 </td> | ||
<td><span style="background:#C6E3FF">What</span> Does the Fox Say? 12 狐狸怎叫 | <td><span style="background:#C6E3FF">What</span> <span style="background:#C6E3FF">Does</span> <span style="background:#C6E3FF">the</span> <span style="background:#C6E3FF">Fox</span> <span style="background:#C6E3FF">Say</span>? <span style="background:#C6E3FF">12</span> 狐狸怎叫 <span style="background:#C6E3FF">_34</span></td> | ||
<td> 任意次的不是英文、數字和底線符號( _ )的文字 <br /> {{kbd | key = <nowiki>\W+</nowiki>}} = {{kbd | key = <nowiki>[^a-zA-Z0-9_]+</nowiki>}}</td> | <td> 任意次的不是英文、數字和底線符號( _ )的文字 <br /> {{kbd | key = <nowiki>\W+</nowiki>}} = {{kbd | key = <nowiki>[^a-zA-Z0-9_]+</nowiki>}}</td> | ||
<td>[http://regexr.com/3bk4v demo]</td> | <td>[http://regexr.com/3bk4v demo]</td> | ||
Line 59: | Line 59: | ||
<td>不包含數字的任意次文字(包含空白 <br /> {{kbd | key = <nowiki>[^\d]+</nowiki>}} = {{kbd | key = <nowiki>[^0-9]+</nowiki>}} = {{kbd | key = <nowiki>\D+</nowiki>}} </td> | <td>不包含數字的任意次文字(包含空白 <br /> {{kbd | key = <nowiki>[^\d]+</nowiki>}} = {{kbd | key = <nowiki>[^0-9]+</nowiki>}} = {{kbd | key = <nowiki>\D+</nowiki>}} </td> | ||
<td><span style="background:#C6E3FF">What Does the Fox Say? </span>12 狐狸怎叫 34</td> | <td><span style="background:#C6E3FF">What Does the Fox Say? </span>12 狐狸怎叫 34</td> | ||
</tr> | |||
<tr> | |||
<td> 任意次的中文字 <br /> {{kbd | key = <nowiki>[\p{Han}]+</nowiki>}} ([https://regex101.com/r/UYkdml/1 demo]、[[Regular expression#尋找中文、非英文的文字 | 詳細說明]])</td> | |||
<td>What Does the Fox Say? 12 <span style="background:#C6E3FF">狐狸怎叫</span> 34</td> | |||
<td>不包含中文字的任意次文字 <br /> {{kbd | key = <nowiki>[^\p{Han}]+</nowiki>}} ([https://regex101.com/r/Nk9GdA/1 demo])</td> | |||
<td></td> | |||
</tr> | </tr> | ||
<tr> | <tr> | ||
Line 85: | Line 91: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td> 包含「狐狸」的行 <br /> {{kbd | key = <nowiki>^.*狐狸.*$</nowiki>}}</td> | <td> 包含「狐狸」的行 <br /> {{kbd | key = <nowiki>^.*狐狸.*$</nowiki>}} 或 {{kbd | key = <nowiki>(狐狸)</nowiki>}} ([https://regex101.com/r/UEtYst/1 demo])</td> | ||
<td> | <td> | ||
<span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br /> | <span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br /> | ||
What Does the shiba inu say? 柴犬怎叫 | What Does the shiba inu say? 柴犬怎叫 | ||
</td> | </td> | ||
<td>不包含「狐狸」的行 | <td>不包含「狐狸」的行 ([https://regex101.com/r/rvncjU/1 demo]) <br /> {{kbd | key = <nowiki>^((?!狐狸).)*$</nowiki>}} </td> | ||
<td> | <td> | ||
What Does the Fox Say? 12 狐狸怎叫 34<br /> | What Does the Fox Say? 12 狐狸怎叫 34<br /> | ||
Line 97: | Line 103: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td> 布林邏輯 AND: 包含「狐狸」和「叫」的行 ([http://regexr.com/3aokl demo])<ref>[http://stackoverflow.com/questions/469913/regular-expressions-is-there-an-and-operator regex - Regular Expressions: Is there an AND operator? - Stack Overflow]</ref><br /> {{kbd | key = <nowiki>(?=.*狐狸)(?=.*叫).*</nowiki>}}</td> | <td> 布林邏輯 AND: 包含「狐狸」和「叫」的行 ([http://regexr.com/3aokl demo])<ref>[http://stackoverflow.com/questions/469913/regular-expressions-is-there-an-and-operator regex - Regular Expressions: Is there an AND operator? - Stack Overflow]</ref><br /> {{kbd | key = <nowiki>(?=.*狐狸)(?=.*叫).*</nowiki>}} 或 {{kbd | key = <nowiki>狐狸.*叫|叫.*狐狸</nowiki>}}</td> | ||
<td> | <td> | ||
<span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br /> | <span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34</span><br /> | ||
Line 107: | Line 113: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td> 布林邏輯 OR: 包含「狐狸」或「叫」的行 ([ | <td> 布林邏輯 OR: 包含「狐狸」或「叫」的行 ([https://regexr.com/6cu06 demo])<br /> {{kbd | key = <nowiki>.*(狐狸|叫).*</nowiki>}}</td> | ||
<td> | <td> | ||
<span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34<br /> | <span style="background:#C6E3FF">What Does the Fox Say? 12 狐狸怎叫 34<br /> | ||
Line 119: | Line 125: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td> 布林邏輯 NOT: 不包含「狐狸」,但包含「柴犬」的行 ([http://regexr.com/3aokr demo])<ref>[http://stackoverflow.com/questions/2953039/regular-expression-for-a-string-containing-one-word-but-not-another regex - Regular expression for a string containing one word but not another - Stack Overflow]</ref><br /> {{kbd | key = <nowiki>^((?!狐狸).)*(柴犬).*$</nowiki>}} = {{kbd | key = <nowiki>^(柴犬).*((?!狐狸).)*$</nowiki>}} = {{kbd | key = <nowiki>(柴犬).*((?!狐狸).)*</nowiki>}}</td> | <td> 布林邏輯 NOT: 不包含「狐狸」,但包含「柴犬」的行 ([http://regexr.com/3aokr demo])<ref>[http://stackoverflow.com/questions/2953039/regular-expression-for-a-string-containing-one-word-but-not-another regex - Regular expression for a string containing one word but not another - Stack Overflow]</ref><br /> {{kbd | key = <nowiki>^((?!狐狸).)*(柴犬).*$</nowiki>}} = {{kbd | key = <nowiki>^(柴犬).*((?!狐狸).)*$</nowiki>}} = {{kbd | key = <nowiki>(柴犬).*((?!狐狸).)*</nowiki>}} (如果句子同時存在狐狸和柴犬會出錯) </td> | ||
<td> | <td> | ||
What Does the Fox Say? 12 狐狸怎叫 34<br /> | What Does the Fox Say? 12 狐狸怎叫 34<br /> | ||
Line 130: | Line 136: | ||
== Regular expression online tools == | == Regular expression online tools == | ||
測試 Regular expression 語法的網站 | |||
* {{Gd}} [http://regex101.com/ RegEx101] "Online regex tester and debugger: PHP, PCRE, Python, Golang and JavaScript" ([http://regex101.com/r/tH1eT7/1 example]) 有提供語法解說。教學: [https://www.minwt.com/webdesign-dev/html/20352.html RegEx101正規表示法線上產生器,有沒有選到立馬告訴你|梅問題.教學網] | * {{Gd}} [http://regex101.com/ RegEx101] "Online regex tester and debugger: PHP, PCRE, Python, Golang and JavaScript" ([http://regex101.com/r/tH1eT7/1 example]) 有提供語法解說。教學: [https://www.minwt.com/webdesign-dev/html/20352.html RegEx101正規表示法線上產生器,有沒有選到立馬告訴你|梅問題.教學網] | ||
* {{Gd}} [http://gskinner.com/RegExr/ RegExr]: Learn, Build, & Test RegEx ([http://regexr.com/395t0 example]). 有提供語法解說. 教學: [http://blog.hsdn.net/1426.html RegExr: 功能強大的正規式撰寫協助工具] | * {{Gd}} [http://gskinner.com/RegExr/ RegExr]: Learn, Build, & Test RegEx ([http://regexr.com/395t0 example]). 有提供語法解說. 教學: [http://blog.hsdn.net/1426.html RegExr: 功能強大的正規式撰寫協助工具] | ||
* [https://regexper.com/ Regexper]: 圖解方式提供語法解說 e.g. [https://regexper.com/#%5Cd%7B3%7D%28.*%29 \d{3}(.*)] | |||
* [https://jex.im/regulex/ Regulex:JavaScript Regular Expression Visualizer] : 圖解方式提供語法解說 e.g. [https://jex.im/regulex/#!flags=&re=%5E(a%7Cb)*%3F%24 ^(a|b)*?$] | |||
* [http://www.rubular.com/ Rubular]: a Ruby regular expression editor and tester ([http://www.rubular.com/r/UZuUT5pjeh example]) | * [http://www.rubular.com/ Rubular]: a Ruby regular expression editor and tester ([http://www.rubular.com/r/UZuUT5pjeh example]) | ||
* [http://www.phpliveregex.com/ PHP Live Regex] {{access | date=2014-11-25}} | * [http://www.phpliveregex.com/ PHP Live Regex] {{access | date=2014-11-25}} | ||
* [http://www.regextester.com/ Regex Tester and Debugger Online - Javascript, PCRE, PHP] {{access | date=2016-01-07}} | * [http://www.regextester.com/ Regex Tester and Debugger Online - Javascript, PCRE, PHP] {{access | date=2016-01-07}} | ||
* [http://rocksaying.tw/archives/2670695.html Regular Expression (RegExp) in JavaScript - 石頭閒語] {{access | date=2017-11-14}} | * [http://rocksaying.tw/archives/2670695.html Regular Expression (RegExp) in JavaScript - 石頭閒語] {{access | date=2017-11-14}} | ||
Examples | |||
* {{Gd}} [http://regexlib.com/ Regular Expression Library] 網友提供的 pattern 範例 | * {{Gd}} [http://regexlib.com/ Regular Expression Library] 網友提供的 pattern 範例 | ||
== cases == | == cases == | ||
=== | === 取代換行符號為逗號 === | ||
將Email清單,轉成Email軟體可以使用的寄信名單 | |||
<pre> | <pre> | ||
原 | 原 | ||
Line 150: | Line 159: | ||
改成 | 改成 | ||
</pre> | </pre> | ||
Line 157: | Line 166: | ||
# Menu: Search -> Replace | # Menu: Search -> Replace | ||
# click "Use Regular Expression" | # click "Use Regular Expression" | ||
## Find: {{kbd | key = <nowiki>\n</nowiki>}} ([ | ## Find: {{kbd | key = <nowiki>\n</nowiki>}} ([[Return symbol | 換行符號]] 。{{Win}} 作業系統的換行符號是 {{kbd | key = <nowiki>\r\n</nowiki>}}、{{Mac}} 作業系統的換行符號是 {{kbd | key = <nowiki>\n</nowiki>}},取兩者共有的符號。如果使用 {{Linux}} 作業系統的換行符號是 {{kbd | key = <nowiki>\r</nowiki>}}。 ) | ||
## Replace with: {{kbd | key = <nowiki>, </nowiki>}} | ## Replace with: {{kbd | key = <nowiki>, </nowiki>}} | ||
# click "Replace all" | # click "Replace all" | ||
Line 236: | Line 245: | ||
# 儲存檔案 | # 儲存檔案 | ||
=== Find IP address === | 相關資料 | ||
* [https://www.hexdictionary.com/ Hex Dictionary | Convert Hex / Hexadecimal Numbers to Binary and Decimal] | |||
=== Find IP address (IPv4) === | |||
適用 [http://notepad-plus-plus.org/ Notepad++] 軟體 v.5.9.5 | |||
# 選單: 尋找 -> 取代 | # 選單: 尋找 -> 取代 | ||
# 搜尋模式: 勾選「用類型表式」 | # 搜尋模式: 勾選「用類型表式」 | ||
## 尋找目標: \d\d?\d?\.\d\d?\d?\.\d\d?\d?\.\d\d?\d? | ## 尋找目標: {{kbd | key=<nowiki>\d\d?\d?\.\d\d?\d?\.\d\d?\d?\.\d\d?\d?</nowiki>}} | ||
note: not support {n} syntax | note: not support {n} syntax | ||
適用 [https://www.sublimetext.com/ Sublime Text] v. 3.2.21 | |||
# Find: {{kbd | key=<nowiki>(?:\d{1,3}\.){3}\d{1,3}</nowiki>}} | |||
參考資料: | 參考資料: | ||
* [https://www.regular-expressions.info/ip.html How to Find or Validate an IP Address] {{access | date = 2019-06-05}} | |||
* [http://sourceforge.net/projects/notepad-plus/forums/forum/331754/topic/4780602 SourceForge.net: Notepad++: Regular expression for IP addresses] | * [http://sourceforge.net/projects/notepad-plus/forums/forum/331754/topic/4780602 SourceForge.net: Notepad++: Regular expression for IP addresses] | ||
* [http://stackoverflow.com/questions/53497/regular-expression-that-matches-valid-ipv6-addresses regex - Regular expression that matches valid IPv6 addresses - Stack Overflow] {{access | date = 2015-08-10}} | * [http://stackoverflow.com/questions/53497/regular-expression-that-matches-valid-ipv6-addresses regex - Regular expression that matches valid IPv6 addresses - Stack Overflow] {{access | date = 2015-08-10}} | ||
Line 305: | Line 322: | ||
'Elmo', 'Emie', 'Granny Bird' | 'Elmo', 'Emie', 'Granny Bird' | ||
</pre> | </pre> | ||
方法1: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor]。該方法有處理每行的前面或後面可能有一格或多格空白 | 方法1: 使用 [http://www.sublimetext.com/ Sublime Text] 、[https://notepad-plus-plus.org/downloads/ Notepad++] 或 [https://zh-tw.emeditor.com/ EmEditor]。該方法有處理每行的前面或後面可能有一格或多格空白 | ||
如果使用 {{Mac}} 作業系統 | |||
* Find what: {{kbd | key = <nowiki>(\S+)(\s?)+$\n</nowiki>}} | * Find what: {{kbd | key = <nowiki>(\S+)(\s?)+$\n</nowiki>}} | ||
* Replace with: {{kbd | key = <nowiki>'\1', </nowiki>}} <br />(如果要使用雙引號框起來,則是 Replace with: {{kbd | key = <nowiki>"\1", </nowiki>}}) | * Replace with: {{kbd | key = <nowiki>'\1', </nowiki>}} <br />(如果要使用雙引號框起來,則是 Replace with: {{kbd | key = <nowiki>"\1", </nowiki>}}) | ||
如果使用 {{Win}} 作業系統,需要修改換行符號 {{kbd | key = <nowiki>\n</nowiki>}} 為 {{kbd | key = <nowiki>\r\n</nowiki>}} | |||
* Find what: {{kbd | key = <nowiki>(\S+)(\s?)+$\r\n</nowiki>}} on {{Mac}} | |||
* Replace with: {{kbd | key = <nowiki>'\1', </nowiki>}} <br />(如果要使用雙引號框起來,則是 Replace with: {{kbd | key = <nowiki>"\1", </nowiki>}}) | |||
方法2: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor] {{exclaim}} 該方法沒有處理每行的後面可能有一格或多格空白 | 方法2: 使用 [http://www.sublimetext.com/ Sublime Text] 或 [https://zh-tw.emeditor.com/ EmEditor] {{exclaim}} 該方法沒有處理每行的後面可能有一格或多格空白 | ||
* Find what: {{kbd | key = <nowiki>(.*)$\n</nowiki>}} 或 {{kbd | key = <nowiki>(\S+)$\n</nowiki>}} 或 {{kbd | key = <nowiki>(\S+)\n</nowiki>}} | * Find what: {{kbd | key = <nowiki>(.*)$\n</nowiki>}} 或 {{kbd | key = <nowiki>(\S+)$\n</nowiki>}} 或 {{kbd | key = <nowiki>(\S+)\n</nowiki>}} | ||
* Replace with: {{kbd | key = <nowiki>'\1', </nowiki>}} | * Replace with: {{kbd | key = <nowiki>'\1', </nowiki>}} | ||
More details on the page [[Add quotation at the start and end of each line | add quotation at the start and end of each line]]. | |||
</div> | </div> | ||
Line 337: | Line 363: | ||
<div style="clear:both;"> </div> | <div style="clear:both;"> </div> | ||
=== 尋找中文、非英文的文字 === | |||
適用: Google Drive | === 將試算表欄位值前後,加上雙引號框起來 === | ||
* [https://errerrors.blogspot.com/2019/03/how-to-enclose-non-empty-cell-with-double-quotes-in-google-spreadsheet.html Google 試算表的文字類型欄位值的前後加上雙引號] | |||
=== Find non-ASCII characters 尋找中文、非英文的文字 === | |||
==== Find non-ASCII characters in Google sheet ==== | |||
適用: Google Drive 試算表的 Regular expression 相關函數,例如: [https://support.google.com/docs/answer/3098292?hl=zh-Hant REGEXMATCH]、[https://support.google.com/docs/answer/3098244?hl=en REGEXEXTRACT]、[https://support.google.com/docs/answer/3098245?hl=zh-Hant RegExReplace] 函數、Notepad++的搜尋 | |||
<pre> | <pre> | ||
[^\x00-\x80]+ | [^\x00-\x80]+ | ||
</pre> | </pre> | ||
適用: | ==== Find non-ASCII characters in LibreOffice ==== | ||
適用: [https://zh-tw.libreoffice.org/ LibreOffice] [https://help.libreoffice.org/6.2/en-US/text/scalc/01/func_regex.html REGEX] function<ref>[https://help.libreoffice.org/6.2/en-US/text/shared/01/02100001.html?&DbPAR=WRITER&System=MAC List of Regular Expressions]</ref>、Total commander 的 Multi-Rename tool<ref>取代非英文的文字,但是不包含 . 符號: <nowiki>[^\u0000-\u0080|.]+ </nowiki></ref><ref>[http://stackoverflow.com/questions/150033/regular-expression-to-match-non-english-characters javascript - Regular expression to match non-english characters? - Stack Overflow]</ref> | |||
<pre> | <pre> | ||
[^\u0000-\u0080]+ | [^\u0000-\u0080]+ | ||
</pre> | </pre> | ||
==== Find Chinese characters in Google sheet ==== | |||
範例:如果 A2 包含任一中文字,則欄位值顯示「中文」。如果未包含任何中文字,則欄位值顯示「英文」: | |||
<pre> | |||
=IF(REGEXMATCH(A2, "[\一-\龥]"), "中文", "英文") | |||
</pre> | |||
{{exclaim}} Google 不支援以下語法,會顯示「... 是無效的規則運算式。」錯誤 | |||
* {{kbd | key=<nowiki>[\u4e00-\u9fa5]</nowiki>}} | |||
* {{kbd | key=<nowiki>[^\u4e00-\u9fa5]</nowiki>}} | |||
* {{kbd | key=<nowiki>[\p{Script=Hans}]</nowiki>}} | |||
* {{kbd | key=<nowiki>[\p{Han}]</nowiki>}} | |||
==== Find Chinese characters in MySQL ==== | |||
尋找 `column_name` 欄位值包含中文字。適用: MySQL<ref>[https://stackoverflow.com/questions/9795137/how-to-detect-rows-with-chinese-characters-in-mysql How to detect rows with chinese characters in MySQL? - Stack Overflow]</ref><ref>[https://stackoverflow.com/questions/401771/how-can-i-find-non-ascii-characters-in-mysql How can I find non-ASCII characters in MySQL? - Stack Overflow]</ref> | |||
<pre> | <pre> | ||
SELECT `column_name` | SELECT `column_name` | ||
Line 355: | Line 400: | ||
</pre> | </pre> | ||
說明 | |||
PHP: | * 正則表達式 '^(..)*(E[4-9])' 的含義是尋找從字符串開始處(表示為 ^),每兩個字符(表示為 ..)重複零次或多次(表示為 *),直到找到一個匹配 (E[4-9]) 的序列。 | ||
* 透過加入 ^(..)* 使得搜尋條件更加嚴格,它要求 (E[4-9]) 的出現位置必須是在一個合法的 UTF-8 字符邊界上。這意味著它更可能正確匹配開頭為中文字符的字符串,而忽略那些僅在中間或末尾偶然包含 E4 到 E9 序列的非中文字符串。 | |||
==== Find non-ASCII characters in MySQL ==== | |||
尋找 `column_name` 欄位值不完全是 ASCII 字元 | |||
<pre> | |||
SELECT `column_name` | |||
FROM `table_name` | |||
WHERE `column_name` <> CONVERT(`column_name` USING ASCII) | |||
</pre> | |||
==== Find non-ASCII characters in PHP ==== | |||
尋找欄位值包含中文字,中文字包含繁體中文與簡體中文,不包含標點符號 (例如 {{kbd | key = <nowiki>,</nowiki>}})、全形標點符號 (例如 {{kbd | key = <nowiki>,</nowiki>}})以及特殊符號,例如 Emoji:{{kbd | key = ⭐}}。 | |||
PHP: exact match | |||
<pre> | <pre> | ||
// approach 1 | // approach 1 | ||
Line 373: | Line 431: | ||
</pre> | </pre> | ||
技術問題除錯: | partial match ([http://sandbox.onlinephpfunctions.com/code/d780845d20877c0fd2e693b28ed02a10d250d39e online demo] hosted by [http://sandbox.onlinephpfunctions.com/ PHP Sandbox]) | ||
<pre> | |||
// approach 1 | |||
$string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐'; | |||
$pattern = '/[\p{Han}]+/u'; | |||
preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE); | |||
var_dump($matches); | |||
// approach 2 | |||
$string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐'; | |||
$pattern = '/[\x{4e00}-\x{9fa5}]+/u'; | |||
preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE); | |||
var_dump($matches); | |||
</pre> | |||
技術問題除錯: 錯誤訊息 | |||
<pre>preg_match(): Compilation failed: character value in \x{} or \o{} is too large at offset 8</pre> | |||
解決方式: [http://php.net/manual/en/function.preg-match.php preg_match()] 需要加上 {{kbd | key = u }} 變數<ref>[https://stackoverflow.com/questions/32375531/preg-match-compilation-failed-character-value-in-x-or-o-is-too-large-a php - preg_match(): Compilation failed: character value in \x{} or \o{} is too large at offset 27 on line number 25 - Stack Overflow]</ref>。 | 解決方式: [http://php.net/manual/en/function.preg-match.php preg_match()] 需要加上 {{kbd | key = u }} 變數<ref>[https://stackoverflow.com/questions/32375531/preg-match-compilation-failed-character-value-in-x-or-o-is-too-large-a php - preg_match(): Compilation failed: character value in \x{} or \o{} is too large at offset 27 on line number 25 - Stack Overflow]</ref>。 | ||
==== Find non-ASCII characters in JavaScript ==== | |||
* [https://stackoverflow.com/questions/21109011/javascript-unicode-string-chinese-character-but-no-punctuation regex - Javascript unicode string, chinese character but no punctuation - Stack Overflow] | |||
參考資料: | 參考資料: | ||
* [http://blog.csdn.net/tinyletero/article/details/8201465 unicode编码 \u4e00-\u9fa5 匹配所有中文 - CSDN博客] | * [http://blog.csdn.net/tinyletero/article/details/8201465 unicode编码 \u4e00-\u9fa5 匹配所有中文 - CSDN博客] | ||
* [https://stackoverflow.com/questions/38168419/codeigniter-form-validation-for-chinese-words php - CodeIgniter Form Validation for Chinese Words - Stack Overflow] | * [https://stackoverflow.com/questions/38168419/codeigniter-form-validation-for-chinese-words php - CodeIgniter Form Validation for Chinese Words - Stack Overflow] | ||
* [https://zh.wikipedia.org/zh-tw/%E4%B8%AD%E6%97%A5%E9%9F%93%E7%B5%B1%E4%B8%80%E8%A1%A8%E6%84%8F%E6%96%87%E5%AD%97%E5%88%97%E8%A1%A8 中日韓統一表意文字列表 - 維基百科,自由的百科全書] | |||
=== 尋找英文字 === | |||
==== 尋找 ASCII 字元 in MySQL ==== | |||
<pre> | |||
-- 尋找欄位 `my_column` 欄位值是 ASCII 字元 | |||
SELECT * | |||
FROM `my_table` | |||
WHERE `my_column` LIKE CONVERT(`my_column` USING ASCII) | |||
</pre> | |||
相關文章 | |||
* [https://errerrors.blogspot.com/2020/07/search-app-not-apple-in-englishsearching.html 解決英文字的搜尋:搜尋 app 而不是 apple] | |||
參考資料 | |||
* [https://stackoverflow.com/questions/401771/how-can-i-find-non-ascii-characters-in-mysql How can I find non-ASCII characters in MySQL? - Stack Overflow] | |||
==== 尋找英文字、數字、破折號(-)或底線(_)字元 in MySQL ==== | |||
<pre> | |||
-- 尋找欄位 `my_column` 欄位值是包含英文字、數字、破折號(-)或底線(_)的字串 | |||
SELECT * | |||
FROM `my_table` | |||
WHERE `my_column` REGEXP '[a-zA-Z0-9\-_]' | |||
</pre> | |||
=== 將每行文字的行頭加上逗號符號 === | === 將每行文字的行頭加上逗號符號 === | ||
[[Adding characters to document lines]] | |||
=== 知道前面跟後面的文字,但是中間文字忘記了 === | === 知道前面跟後面的文字,但是中間文字忘記了 === | ||
Line 421: | Line 521: | ||
* 使用工具: 適用 Sublime Text 與 EmEditor 軟體,需勾選「使用規則運算式」。{{exclaim}} 以下語法不適用於 Notepad++ 軟體<ref>[http://www.sitepoint.com/forums/showthread.php?448843-Regex-delete-multiple-blank-lines Regex: delete multiple blank lines]</ref> | * 使用工具: 適用 Sublime Text 與 EmEditor 軟體,需勾選「使用規則運算式」。{{exclaim}} 以下語法不適用於 Notepad++ 軟體<ref>[http://www.sitepoint.com/forums/showthread.php?448843-Regex-delete-multiple-blank-lines Regex: delete multiple blank lines]</ref> | ||
** 尋找: {{kbd | key=<nowiki>^[\s\t]*$\n</nowiki>}} --> 取代為: 空 (不需要輸入任何字) | ** 尋找: {{kbd | key=<nowiki>^[\s\t]*$\n</nowiki>}} --> 取代為: 空 (不需要輸入任何字) | ||
* 使用工具: Notepad++ | * 使用工具: Notepad++ v7.8.7 | ||
** Notepad++ 軟體選單: 編輯 -> | ** Notepad++ 軟體選單: 編輯 -> 行處理 -> 移除空行(包括只有空白字元的行)<ref>[http://stackoverflow.com/questions/3866034/removing-empty-lines-in-notepad regex - Removing empty lines in Notepad++ - Stack Overflow]</ref> | ||
* 詳細說明,請見 [[Regular replace blank lines]] | * 詳細說明,請見 [[Regular replace blank lines]] | ||
=== 尋找非空白的文字 === | === 尋找非空白的文字 === | ||
* 尋找: {{kbd | key=<nowiki>[^\s]+</nowiki>}} [https://regex101.com/r/zH7wV3/1 online demo] | * 尋找: {{kbd | key=<nowiki>[^\s]+</nowiki>}} [https://regex101.com/r/zH7wV3/1 online demo] | ||
* [https://errerrors.blogspot.com/2022/01/avoid-whitespace-character-caused-program-stop-abnormally.html 解決遇到空白段落發生程式異常錯誤而執行中斷的問題] 「... 看起來空白的字元,卻無法使用 TRIM 函數去除,可能是其他的空白字元。解決方式是偵測段落內有沒有包含中英文、數字,再進行後續處理。」 | |||
=== 去除標點符號、特殊符號等 === | |||
* [https://stackoverflow.com/questions/5689918/php-strip-punctuation/5689989 regex - PHP strip punctuation - Stack Overflow] | |||
=== 將特定符號相隔的文字,改成逐行顯示 === | === 將特定符號相隔的文字,改成逐行顯示 === | ||
Line 453: | Line 558: | ||
* <nowiki>[、]{1}</nowiki> : 出現頓號(、)一次的文字 | * <nowiki>[、]{1}</nowiki> : 出現頓號(、)一次的文字 | ||
* <nowiki>([、]{1})</nowiki> : 符合「出現頓號(、)一次的文字」規則的文字 | * <nowiki>([、]{1})</nowiki> : 符合「出現頓號(、)一次的文字」規則的文字 | ||
=== 將每行文字的結尾處,加入空一格 (半形空白) === | === 將每行文字的結尾處,加入空一格 (半形空白) === | ||
Line 484: | Line 588: | ||
</pre> | </pre> | ||
說明: \S 代表非空白字元, \r\n | 說明: \S 代表非空白字元, \r\n 代表[[Return symbol | 換行符號]]。[^\S\r\n] 則代表不是非空白字元、也不是換行符號。換句話說尋找空白,但不包含換行符號。 | ||
使用 Sublime Text 軟體 (參考資料<ref>[http://www.techrepublic.com/blog/microsoft-office/quickly-replace-multiple-space-characters-with-a-tab-character/ Quickly replace multiple space characters with a tab character - TechRepublic]</ref> <ref>[http://stackoverflow.com/questions/3469080/match-whitespace-but-not-newlines-perl regex - Match whitespace but not newlines (Perl) - Stack Overflow]</ref>) | 使用 Sublime Text 軟體 (參考資料<ref>[http://www.techrepublic.com/blog/microsoft-office/quickly-replace-multiple-space-characters-with-a-tab-character/ Quickly replace multiple space characters with a tab character - TechRepublic]</ref> <ref>[http://stackoverflow.com/questions/3469080/match-whitespace-but-not-newlines-perl regex - Match whitespace but not newlines (Perl) - Stack Overflow]</ref>) | ||
# Menu: Search -> Replace | # Menu: Search -> Replace | ||
# click "Use Regular Expression" | # click "Use Regular Expression" | ||
## Find: {{kbd | key = <nowiki>([^\S\n]+)</nowiki>}} 或 {{kbd | key = <nowiki>([^\S\r\n]+)</nowiki>}} 或 {{kbd | key = <nowiki>_{1,}</nowiki>}} ( 自行替換 _ 成半形空白) | ## Find: {{kbd | key = <nowiki>([^\S\n]+)</nowiki>}} 或 {{kbd | key = <nowiki>([^\S\r\n]+)</nowiki>}} 或 {{kbd | key = <nowiki>\s\s+</nowiki>}} 或 {{kbd | key = <nowiki>_{1,}</nowiki>}} ( 自行替換 _ 成半形空白) {{exclaim}} 因為 {{kbd | key = <nowiki>\s</nowiki>}} 包含了空白與換行字元,所以不能直接使用 {{kbd | key = <nowiki>\s+</nowiki>}} 當做搜尋條件 | ||
## Replace with: {{kbd | key = <nowiki>\t</nowiki>}} | ## Replace with: {{kbd | key = <nowiki>\t</nowiki>}} | ||
# click "Replace all" | # click "Replace all" | ||
Line 526: | Line 630: | ||
=== 尋找文章內容中的網址 === | === 尋找文章內容中的網址 === | ||
[[ | [[Extract url from text]] | ||
=== | === 尋找數字 === | ||
[[Extract large number from text]] | 請參考 [[Data cleaning#Numeric]] | ||
* [[Extract large number from text | 尋找文章內容中的長數字]] | |||
* [https://errerrors.blogspot.com/2020/02/convert-minguo-calendar-to-common-era-using-google-sheet.html Google 試算表將民國轉西元日期] | |||
=== 移除刮號內的文字 === | |||
請參考 [[Remove text within brackets]] | |||
=== Search unmatched string === | === Search unmatched string === | ||
Line 546: | Line 657: | ||
[^/](console\.log) | [^/](console\.log) | ||
</pre> | </pre> | ||
== Text editor with support for regular expression == | |||
[[Text editor with support for regular expression]] | |||
== Regular expression batch tools == | == Regular expression batch tools == | ||
Line 563: | Line 677: | ||
== Troubleshooting of regular expression == | == Troubleshooting of regular expression == | ||
Tips | Tips | ||
* Use online tool [https://regex101.com/ regex101: build, test, and debug regex] to obtain the explain of your syntax | |||
* Small data test: (1) Prepare the small file data to verify the syntax (2) Using the [[Regular_expression#Regular_expression_online_tools | online tools]] | * Small data test: (1) Prepare the small file data to verify the syntax (2) Using the [[Regular_expression#Regular_expression_online_tools | online tools]] | ||
* Highlight or output the matched text e.g. {{kbd | key=<nowiki>--color</nowiki>}}<ref>[https://www.cyberciti.biz/faq/howto-use-grep-command-in-linux-unix/grep_command_examples/ Grep -color command Examples - nixCraft]</ref> for grep command or output the matches by PHP [http://php.net/manual/en/function.preg-match.php preg_match()] function. | * Highlight or output the matched text e.g. {{kbd | key=<nowiki>--color</nowiki>}}<ref>[https://www.cyberciti.biz/faq/howto-use-grep-command-in-linux-unix/grep_command_examples/ Grep -color command Examples - nixCraft]</ref> for grep command or output the matches by PHP [http://php.net/manual/en/function.preg-match.php preg_match()] function. | ||
Line 569: | Line 684: | ||
Related articles | Related articles | ||
* [ | * [https://errerrors.blogspot.com/2015/07/sublime-text-invalid-lookbehind.html Err: 解決 Sublime Text 正則表示式搜尋,遇到的「Invalid lookbehind assertion」錯誤] | ||
== further reading == | == further reading == | ||
Line 578: | Line 693: | ||
* [http://linux.vbird.org/linux_basic/0320bash.php 鳥哥的 Linux 私房菜 -- 第十章、認識與學習BASH] {{access | date = 2016-06-08}} | * [http://linux.vbird.org/linux_basic/0320bash.php 鳥哥的 Linux 私房菜 -- 第十章、認識與學習BASH] {{access | date = 2016-06-08}} | ||
* [https://stackoverflow.com/questions/3548453/negative-matching-using-grep-match-lines-that-do-not-contain-foo Negative matching using grep (match lines that do not contain foo) - Stack Overflow] {{access | date = 2018-04-06}} | * [https://stackoverflow.com/questions/3548453/negative-matching-using-grep-match-lines-that-do-not-contain-foo Negative matching using grep (match lines that do not contain foo) - Stack Overflow] {{access | date = 2018-04-06}} | ||
* [https://support.google.com/a/answer/1371415?hl=zh-Hant 規則運算式的語法 - G Suite 管理員說明] {{access | date = 2018-12-06}} | |||
unicode | unicode | ||
* [http://www.regular-expressions.info/unicode.html Regex Tutorial - Unicode Characters and Properties] {{access | date = 2014-04-02}} | * [http://www.regular-expressions.info/unicode.html Regex Tutorial - Unicode Characters and Properties] {{access | date = 2014-04-02}} | ||
Line 597: | Line 712: | ||
{{Template:Troubleshooting}} | {{Template:Troubleshooting}} | ||
[[Category:Regular expression]] [[Category:Software]] [[Category:Programming]] [[Category:Data Science]] [[Category:Search]] [[Category: | [[Category:Regular expression]] [[Category:Software]] [[Category:Programming]] [[Category:Data Science]] [[Category:Search]] [[Category:String manipulation]] |