[Google Spreadsheet] A convenient function to extract and display titles from the article list | IMPORTXML function

When you want to get all the page titles from the list of URLs at once, you can use the convenient function “IMPORTXML function” unique to Google Sheets.

What is the IMPORTXML function?

=IMPORTXML(“URL”, “XPath query”)

A function that can extract specific data from a website page using specific conditions (XPath query). It is commonly called ” scraping “. *Scraping is prohibited on many sites, so be careful when using it. Basically, it is safer to think that it is NG to use it on other people’s sites.

What is XPath?

XPath stands for “XML Path Language” and is a language syntax for specifying specific parts of a document that conforms to the markup language XML .

HTML has a tree structure of “html” → “body” → “h1” → …, and XPath describes the hierarchy while separating it with a slash “/”, and specifies another node from the reference node. It is characterized by being able to extract data by

Example: /html/body/div/div/…

Using the IMPORTXML function

It is how to use the IMPORTXML function of the main subject. This is the description method for extracting the title. Specify as follows.

=IMPORTXML("http://example.com","//title")

How, you can get the title of the web page just by this.

Loaded! After that, you can extract the titles of all pages by copying and pasting (or autofilling) all the way down.

All page titles are now available!

It’s surprising that you can get the title of the web page just by doing this.

Precautions for use and summary

Note that if the amount of data is too large, the display will remain “Loading…” and will not be loaded, resulting in an error. Is it tough when it comes to about 300 cases or more? That’s what it feels like. After reading, it seems better to delete the function by pasting the value.

Also, scraping is basically prohibited on many sites, so be careful when using it. Misuse may cause trouble for other sites. Basically, it is safer to think that it is NG to use it on other people’s sites.

There is a list of URLs, and it takes a long time to check the titles one by one and paste them. This is useful when extracting titles from sitemaps, etc.

Let’s use the IMPORTXML function correctly and conveniently!

Please refer to it