14,953
edits
(→擷取完整網址) |
|||
| (14 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
從文章內容中擷取網址 (又稱 [https://zh.wikipedia.org/zh-tw/%E7%BB%9F%E4%B8%80%E8%B5%84%E6%BA%90%E5%AE%9A%E4%BD%8D%E7%AC%A6 統一資源定位符], [https://en.wikipedia.org/wiki/Uniform_Resource_Locator Uniform Resource Locator])。 | 從文章內容中擷取網址 (又稱 [https://zh.wikipedia.org/zh-tw/%E7%BB%9F%E4%B8%80%E8%B5%84%E6%BA%90%E5%AE%9A%E4%BD%8D%E7%AC%A6 統一資源定位符], [https://en.wikipedia.org/wiki/Uniform_Resource_Locator Uniform Resource Locator]) 或[https://zh.wikipedia.org/zh-tw/%E5%9F%9F%E5%90%8D 網域] (domain name)。 | ||
== 從文章內容,擷取完整網址 == | == 從文章內容,擷取完整網址 == | ||
=== 使用 Google sheet 擷取完整網址 === | === 使用 Google sheet 擷取完整網址 === | ||
使用 Google 試算表正規表示法 ([[Regular expression]]) 的 [https://support.google.com/docs/answer/3098244?hl=zh-Hant REGEXEXTRACT] 函數,從文章內容擷取第一個網址。 | |||
* (optional) Step1: [https://workspace.google.com/marketplace/app/extract_urls/143780651832 Extract URLs - Google Workspace Marketplace] "The application extracts links and converts them to the HYPERLINK formula" {{Gd}} | |||
* (optional) Step2: Using the [https://support.microsoft.com/zh-tw/office/formulatext-%E5%87%BD%E6%95%B8-0a786771-54fd-4ae2-96ee-09cda35439c8 FORMULATEXT 函數 - Microsoft 支援服務] | |||
* Step3: 使用 Google 試算表正規表示法 ([[Regular expression]]) 的 [https://support.google.com/docs/answer/3098244?hl=zh-Hant REGEXEXTRACT] 函數,從文章內容擷取第一個網址。 | |||
<pre> | <pre> | ||
=REGEXEXTRACT(A1, "(http[s]?://[a-zA-Z0-9\-_\\._~\:\/\?#\[\]@\!\$&'\(\)\*\+,;\=%]+)") | =REGEXEXTRACT(A1, "(http[s]?://[a-zA-Z0-9\-_\\._~\:\/\?#\[\]@\!\$&'\(\)\*\+,;\=%]+)") | ||
</pre> | |||
詳細操作說明:[https://errerrors.blogspot.com/2023/10/how-to-quickly-extract-links-from-google-sheets.html 如何從 Google 試算表,快速取出連結] | |||
=== 使用 Google sheet 刪除文章內網址 === | |||
Using [https://support.google.com/docs/answer/3098245?hl=zh-Hant REGEXREPLACE] function | |||
<pre> | |||
=REGEXREPLACE(A1, "(http[s]?://[a-zA-Z0-9\-_\\._~\:\/\?#\[\]@\!\$&'\(\)\*\+,;\=%]+)", "") | |||
</pre> | </pre> | ||
| Line 49: | Line 61: | ||
# 根據 [http://tools.ietf.org/html/rfc3986/ RFC 3986] 的 [http://tools.ietf.org/html/rfc3986#section-2 Section 2: Characters] 網址允許的文字有 {{kbd | key = <nowiki>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~:/?#[]@!$&'()*+,;=</nowiki>}},其他文字則需要加上比例符號 % 編碼。 <ref>[http://stackoverflow.com/questions/1547899/which-characters-make-a-url-invalid validation - Which characters make a URL invalid? - Stack Overflow]</ref> | # 根據 [http://tools.ietf.org/html/rfc3986/ RFC 3986] 的 [http://tools.ietf.org/html/rfc3986#section-2 Section 2: Characters] 網址允許的文字有 {{kbd | key = <nowiki>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~:/?#[]@!$&'()*+,;=</nowiki>}},其他文字則需要加上比例符號 % 編碼。 <ref>[http://stackoverflow.com/questions/1547899/which-characters-make-a-url-invalid validation - Which characters make a URL invalid? - Stack Overflow]</ref> | ||
== | == 從 HTML 文字,擷取完整網址 == | ||
=== 使用 Google sheet | === 使用 Google sheet 擷取完整網址 === | ||
# Using [https://extract-urls.contributor.pw/ EXTRACT URLs] to extracts links and converts them to the HYPERLINK formula. | |||
# Using [https://support.google.com/docs/answer/9365792?hl=en FORMULATEXT function - Google Docs Editors Help] | |||
# Using [https://support.google.com/docs/answer/3098244?hl=zh-Hant REGEXEXTRACT] function to extract the Url from above cell | |||
<pre> | <pre> | ||
=REGEXEXTRACT(A1, "(http[s]?://[a-zA-Z0-9\-_\\._~\:\/\?#\[\]@\!\$&'\(\)\*\+,;\=%]+)") | |||
</pre> | </pre> | ||
參考資料: | |||
* [https://support.google.com/docs/thread/34116680/extract-url-from-pasted-external-text-with-link-embedded?hl=en Extract URL from pasted external text with link embedded - Google Docs Editors Community] | |||
https:// | |||
== 從文章內容,擷取網址中的網域部分 == | |||
[[Extract domain from text in Mandarin | 從文章擷取網址中的網域部分]] | |||
== | == 從文章內容,擷取特定檔案類型的網址 == | ||
=== 使用 Sublime Text 擷取特定檔案類型的網址 === | === 使用 Sublime Text 擷取特定檔案類型的網址 === | ||
以下語法適用於 [https://www.sublimetext.com/ Sublime Tex] | 以下語法適用於 [https://www.sublimetext.com/ Sublime Tex] | ||
| Line 114: | Line 122: | ||
* 啟動下載任務 | * 啟動下載任務 | ||
== | == 資料驗證用:文章內容是否包含網址 == | ||
使用 Google 試算表 [https://support.google.com/docs/answer/3098292?hl=zh-Hant REGEXMATCH] 函數,符合正規表示法的規則的話,回傳 TRUE。若不符合,則回傳 FALSE。 | 使用 Google 試算表 [https://support.google.com/docs/answer/3098292?hl=zh-Hant REGEXMATCH] 函數,符合正規表示法的規則的話,回傳 TRUE。若不符合,則回傳 FALSE。 | ||
<pre> | <pre> | ||
| Line 140: | Line 147: | ||
FALSE | FALSE | ||
</pre> | </pre> | ||
== References == | == References == | ||
| Line 184: | Line 152: | ||
<references /> | <references /> | ||
[[Category:Regular expression]] [[Category:Data Science]] [[Category:String manipulation]] | [[Category: Regular expression]] [[Category: Data Science]] [[Category: String manipulation]] | ||