Extract url from text: Difference between revisions
Jump to navigation
Jump to search
→從文章內容,擷取完整網址
m (→擷取特定檔案類型的網址) |
|||
| Line 48: | Line 48: | ||
# 網址可能是 <nowiki>http://</nowiki> 或 <nowiki>https://</nowiki> 開頭,所以條件是 {{kbd | key = <nowiki>http[s]?://</nowiki>}} | # 網址可能是 <nowiki>http://</nowiki> 或 <nowiki>https://</nowiki> 開頭,所以條件是 {{kbd | key = <nowiki>http[s]?://</nowiki>}} | ||
# 根據 [http://tools.ietf.org/html/rfc3986/ RFC 3986] 的 [http://tools.ietf.org/html/rfc3986#section-2 Section 2: Characters] 網址允許的文字有 {{kbd | key = <nowiki>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~:/?#[]@!$&'()*+,;=</nowiki>}},其他文字則需要加上比例符號 % 編碼。 <ref>[http://stackoverflow.com/questions/1547899/which-characters-make-a-url-invalid validation - Which characters make a URL invalid? - Stack Overflow]</ref> | # 根據 [http://tools.ietf.org/html/rfc3986/ RFC 3986] 的 [http://tools.ietf.org/html/rfc3986#section-2 Section 2: Characters] 網址允許的文字有 {{kbd | key = <nowiki>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~:/?#[]@!$&'()*+,;=</nowiki>}},其他文字則需要加上比例符號 % 編碼。 <ref>[http://stackoverflow.com/questions/1547899/which-characters-make-a-url-invalid validation - Which characters make a URL invalid? - Stack Overflow]</ref> | ||
== 從 HTML 文字,擷取完整網址 == | |||
=== 使用 Google sheet 擷取完整網址 === | |||
# Using [https://extract-urls.contributor.pw/ EXTRACT URLs] to extracts links and converts them to the HYPERLINK formula. | |||
# Using [https://support.google.com/docs/answer/9365792?hl=en FORMULATEXT function - Google Docs Editors Help] | |||
# Using [https://support.google.com/docs/answer/3098244?hl=zh-Hant REGEXEXTRACT] function to extract the Url from above cell | |||
<pre> | |||
=REGEXEXTRACT(A1, "(http[s]?://[a-zA-Z0-9\-_\\._~\:\/\?#\[\]@\!\$&'\(\)\*\+,;\=%]+)") | |||
</pre> | |||
參考資料: | |||
* [https://support.google.com/docs/thread/34116680/extract-url-from-pasted-external-text-with-link-embedded?hl=en Extract URL from pasted external text with link embedded - Google Docs Editors Community] | |||
== 從文章內容,擷取網址中的網域部分 == | == 從文章內容,擷取網址中的網域部分 == | ||