In this tip, we scrape the <head> tag on pages to get the key information of an article:
We use the Cheerio NPM module, jQuery, and Node/Express on the backend to do the actual scraping. Cheerio is web scraping library that uses jQuery-like syntax to handle the DOM. To see how this demo works, see the image below:
The demo above scrapes the <meta> tags from the url and uses their attributes to identify the desired meta data. In this specific example, the meta tags scraped are shown below:
The actual scraping takes place in the "/routes/scrape.js" file shown below. Here, we fetch the HTML of the specified page using the Axios request library, then find all <meta> tags from this content.
The frontend is a simple jQuery app with Bootstrap for styling. When the user clicks the "Scrape!" button, we make an AJAX query to our scrape.js endpoint which returns the meta data of the article using our web scraper. On the callback from the AJAX request, we populate the content area with the meta data: