Extract url from text: Difference between revisions

Revision as of 10:18, 28 September 2016

使用正規表示法 (Regular expression) ，從文章內容中擷取網址。

使用 Google 試算表 REGEXEXTRACT 函數，從文章內容擷取第一個網址。

=REGEXEXTRACT(A1, "(http[s]?://[a-zA-Z0-9\-_\\._~\:\/\?#\[\]@\!\$&'\(\)\*\+,;\=%]+)")

輸入:

Yahoo! 新聞 https://tw.news.yahoo.com/abc

輸出:

https://tw.news.yahoo.com/abc

說明:

網址可能是 http:// 或 https:// 開頭，所以條件是 http[s]?://
根據 RFC 3986 網址允許的文字有 ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~:/?#[]@!$&'()*+,;=，其他文字則需要加上比例符號 % 編碼。 ^[1]

=REGEXEXTRACT(A1, "(http[s]?\://[^/]+)")

輸入:

Yahoo! 新聞 https://tw.news.yahoo.com/abc

輸出:

https://tw.news.yahoo.com/

說明:

@@ Line 4: / Line 4: @@
 使用 Google 試算表 [https://support.google.com/docs/answer/3098244?hl=zh-Hant REGEXEXTRACT] 函數，從文章內容擷取第一個網址。
 <pre>
-=REGEXEXTRACT(A1, "(http[s]?://[a-zA-Z0-9\-_\\._~\:\/\?#\[\]@\!\$&'\(\)\*\+,;\=%]+)\b?")
+=REGEXEXTRACT(A1, "(http[s]?://[a-zA-Z0-9\-_\\._~\:\/\?#\[\]@\!\$&'\(\)\*\+,;\=%]+)")
 </pre>
@@ Line 23: / Line 23: @@
 == 擷取網址中的網域部分 ==
 <pre>
-=REGEXEXTRACT(A1, "(http[s]?\://[^/]+)\b?")
+=REGEXEXTRACT(A1, "(http[s]?\://[^/]+)")
 </pre>