Intro To Web Scraping

Turbo 360

Sometimes, the data that you want for your project just isn't out there in a convenient JSON or CSV format. This is where web scraping comes in. With web scraping, we can get the data needed for our projects but it requires us to dig through html and identify the tags we want to extract. In this series, we scrape a simple site for NBA scores and return the results in a standard JSON format.


Getting the HTML

12 Minutes | In this video we scaffold the Node project then make a request to www.basketball-reference.com to confirm that we receive valid html. We render the html from basketball-reference.com on our local host. We use the NPM packages Superagent and Cheerio for this.

Extracting Data

19 Minutes | In this video we extract data out of the html received from basketball-reference.com. We use the Cheerio package the load the html into a JQuery-esque reference which makes it easy for use to comb through and find the elements that we need. Using this, we identify the winners and losers of each NBA game on a specified date.

Rendering Data as JSON

13 Minutes | In this video we scrape the scores for each team then render the final results as JSON rather than html. This enables us to use the data in any other application in a consistent format.

Membership Options