Extract title from webpage

From LemonWiki共筆
Jump to navigation Jump to search

How to extract title from webpage using google spreadsheets

Extract title using IMPORTXML in google spreadsheets[edit]

Suggest approach[edit]

=IMPORTXML(A1, "//head/title")

Purpose: This formula specifically searches for the <title> tag within the web page's <head> section and imports its content. This is the most direct and precise method to obtain the web page's title, as the standard HTML structure places the <title> tag inside the <head> tag.

Other approach[edit]

=INDEX(IMPORTXML(A1, "//title"), 1)

The second approach, which employs the `INDEX` function, offers the flexibility to selectively target a specific `<title>` tag on web pages that may not adhere to standard formatting conventions.

=IMPORTXML(A1, "//title")

The third method is the simplest and most straightforward, suitable for most standard HTML pages.

Troubleshooting of IMPORTXML errors[edit]

Error #N/A

  • Error with Details: "Failed to fetch URL: https://www.xxx.com" (無法擷取網址:https://www.xxx.com)
  • Root cause: This error indicated that the webpage might be blocking the crawler from accessing its content.

Error #ERROR!

  • Error with Details: "Formula parse error" (公式剖析錯誤。)
  • Root cause: This issue typically arises when there's an error in the second parameter of the IMPORTXML function, for example, =IMPORTXML(A1, "/html/body/title"). It indicates that the XPath or query provided is incorrect or not formatted properly. Correct one is =IMPORTXML(A1, "/html/head/title").

IMPORTXML Returns Multiple Values:

  • Root cause: This can occur if the targeted webpage does not follow standard formatting practices. To handle this, you might need to adjust the second parameter or use the INDEX function to specify which value you want to extract. e.g. =INDEX(IMPORTXML(A1, "//title"), 1)

References[edit]