13,321
edits
m (→快速查表) |
m (→尋找中文、非英文的文字) |
||
(6 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
透過正規表示法 (Regular Expression) 處理文字檔時,可以快速地搜尋或取代符合特定規則的字串。以每行為單位,進行字串處理<ref>[http://linux.vbird.org/linux_basic/0330regularex.php 鳥哥的 Linux 私房菜 -- 正規表示法 (regular expression, RE) 與文件格式化處理]</ref>。 正規表示法 | 透過正規表示法 (Regular Expression) 處理文字檔時,可以快速地搜尋或取代符合特定規則的字串。以每行為單位,進行字串處理<ref>[http://linux.vbird.org/linux_basic/0330regularex.php 鳥哥的 Linux 私房菜 -- 正規表示法 (regular expression, RE) 與文件格式化處理]</ref>。 正規表示法 又稱正規表示式、正規表達式、正則表達式、正規表示法、正規運算式、規則運算式、常規表示法<ref>[https://zh.wikipedia.org/wiki/%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F 正規表示式 - 維基百科,自由的百科全書]</ref>。 | ||
{{Raise hand | text = 有問題嗎?可以利用提供解說的[[Regular_expression#Regular_expression_online_tools | 線上工具]],嘗試自己除錯。 也可以到[http://www.ptt.cc/bbs/RegExp/index.html 看板 RegExp 文章列表 - 批踢踢實業坊]或其他[[問答服務]]詢問。 }} | {{Raise hand | text = 有問題嗎?可以利用提供解說的[[Regular_expression#Regular_expression_online_tools | 線上工具]],嘗試自己除錯。 也可以到[http://www.ptt.cc/bbs/RegExp/index.html 看板 RegExp 文章列表 - 批踢踢實業坊]或其他[[問答服務]]詢問。 }} | ||
Line 61: | Line 61: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td> 任意次的中文字 <br /> {{kbd | key = <nowiki>[\p{Han}]+</nowiki>}} ([https://regex101.com/r/UYkdml/1 demo])</td> | <td> 任意次的中文字 <br /> {{kbd | key = <nowiki>[\p{Han}]+</nowiki>}} ([https://regex101.com/r/UYkdml/1 demo]、[[Regular expression#尋找中文、非英文的文字 | 詳細說明]])</td> | ||
<td>What Does the Fox Say? 12 <span style="background:#C6E3FF">狐狸怎叫</span> 34</td> | <td>What Does the Fox Say? 12 <span style="background:#C6E3FF">狐狸怎叫</span> 34</td> | ||
<td>不包含中文字的任意次文字 <br /> {{kbd | key = <nowiki>[^\p{Han}]+</nowiki>}} ([https://regex101.com/r/Nk9GdA/1 demo])</td> | <td>不包含中文字的任意次文字 <br /> {{kbd | key = <nowiki>[^\p{Han}]+</nowiki>}} ([https://regex101.com/r/Nk9GdA/1 demo])</td> | ||
Line 364: | Line 364: | ||
</pre> | </pre> | ||
適用: | 適用: [https://zh-tw.libreoffice.org/ LibreOffice] [https://help.libreoffice.org/6.2/en-US/text/scalc/01/func_regex.html REGEX] function<ref>[https://help.libreoffice.org/6.2/en-US/text/shared/01/02100001.html?&DbPAR=WRITER&System=MAC List of Regular Expressions]</ref>、Total commander 的 Multi-Rename tool<ref>取代非英文的文字,但是不包含 . 符號: <nowiki>[^\u0000-\u0080|.]+ </nowiki></ref><ref>[http://stackoverflow.com/questions/150033/regular-expression-to-match-non-english-characters javascript - Regular expression to match non-english characters? - Stack Overflow]</ref> | ||
<pre> | <pre> | ||
[^\u0000-\u0080]+ | [^\u0000-\u0080]+ | ||
Line 376: | Line 376: | ||
</pre> | </pre> | ||
尋找欄位值包含中文字,中文字包含繁體中文與簡體中文,不包含標點符號 (例如 {{kbd | key = <nowiki>,</nowiki>}})、全形標點符號 (例如 {{kbd | key = <nowiki>,</nowiki>}})以及特殊符號,例如 Emoji:{{kbd | key = ⭐}}。 | |||
PHP: | PHP: exact match | ||
<pre> | <pre> | ||
// approach 1 | // approach 1 | ||
Line 394: | Line 394: | ||
</pre> | </pre> | ||
技術問題除錯: | partial match ([http://sandbox.onlinephpfunctions.com/code/d780845d20877c0fd2e693b28ed02a10d250d39e online demo] hosted by [http://sandbox.onlinephpfunctions.com/ PHP Sandbox]) | ||
<pre> | |||
// approach 1 | |||
$string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐'; | |||
$pattern = '/[\p{Han}]+/u'; | |||
preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE); | |||
var_dump($matches); | |||
// approach 2 | |||
$string = '繁體中文-简体中文-English-12345-。,!-.,!-⭐'; | |||
$pattern = '/[\x{4e00}-\x{9fa5}]+/u'; | |||
preg_match_all($pattern, $string, $matches, PREG_OFFSET_CAPTURE); | |||
var_dump($matches); | |||
</pre> | |||
技術問題除錯: 錯誤訊息 | |||
<pre>preg_match(): Compilation failed: character value in \x{} or \o{} is too large at offset 8</pre> | |||
解決方式: [http://php.net/manual/en/function.preg-match.php preg_match()] 需要加上 {{kbd | key = u }} 變數<ref>[https://stackoverflow.com/questions/32375531/preg-match-compilation-failed-character-value-in-x-or-o-is-too-large-a php - preg_match(): Compilation failed: character value in \x{} or \o{} is too large at offset 27 on line number 25 - Stack Overflow]</ref>。 | 解決方式: [http://php.net/manual/en/function.preg-match.php preg_match()] 需要加上 {{kbd | key = u }} 變數<ref>[https://stackoverflow.com/questions/32375531/preg-match-compilation-failed-character-value-in-x-or-o-is-too-large-a php - preg_match(): Compilation failed: character value in \x{} or \o{} is too large at offset 27 on line number 25 - Stack Overflow]</ref>。 | ||
Line 505: | Line 522: | ||
</pre> | </pre> | ||
說明: \S 代表非空白字元, \r\n | 說明: \S 代表非空白字元, \r\n 代表[[Return symbol | 換行符號]]。[^\S\r\n] 則代表不是非空白字元、也不是換行符號。換句話說尋找空白,但不包含換行符號。 | ||
使用 Sublime Text 軟體 (參考資料<ref>[http://www.techrepublic.com/blog/microsoft-office/quickly-replace-multiple-space-characters-with-a-tab-character/ Quickly replace multiple space characters with a tab character - TechRepublic]</ref> <ref>[http://stackoverflow.com/questions/3469080/match-whitespace-but-not-newlines-perl regex - Match whitespace but not newlines (Perl) - Stack Overflow]</ref>) | 使用 Sublime Text 軟體 (參考資料<ref>[http://www.techrepublic.com/blog/microsoft-office/quickly-replace-multiple-space-characters-with-a-tab-character/ Quickly replace multiple space characters with a tab character - TechRepublic]</ref> <ref>[http://stackoverflow.com/questions/3469080/match-whitespace-but-not-newlines-perl regex - Match whitespace but not newlines (Perl) - Stack Overflow]</ref>) | ||
Line 548: | Line 565: | ||
=== 尋找文章內容中的網址 === | === 尋找文章內容中的網址 === | ||
[[Regular extract url from text]] | [[Regular extract url from text]] | ||
=== 尋找數字 === | |||
請參考 [[Data cleaning#Numeric]] | |||
=== 尋找文章內容中的長數字 === | === 尋找文章內容中的長數字 === |