Web Scraping With JavaScript and Cheerio

Turbo 360
turbo360
June 13th, 2020
1 points
View Demo

In this tip, we scrape the <head> tag on pages to get the key information of an article:

  • Title
  • Description
  • Image
  • URL

We use the Cheerio NPM module, jQuery, and Node/Express on the backend to do the actual scraping. Cheerio is web scraping library that uses jQuery-like syntax to handle the DOM. To see how this demo works, see the image below:


The demo above scrapes the <meta> tags from the url and uses their attributes to identify the desired meta data. In this specific example, the meta tags scraped are shown below:

 

The actual scraping takes place in the "/routes/scrape.js" file shown below. Here, we fetch the HTML  of the specified page using the Axios request library, then find all <meta> tags from this content.

routes/scrape.js

The frontend is a simple jQuery app with Bootstrap for styling. When the user clicks the "Scrape!" button, we make an AJAX query to our scrape.js endpoint which returns the meta data of the article using our web scraper. On the callback from the AJAX request, we populate the content area with the meta data:

views/index.mustache

Join Free

Join for free today and deploy your site in seconds


Already have an account? Sign in HERE