R Tutorial: Automated Web Scraping Using rvest

In this R tutorial, we show you how to automatically web scrape using rvest periodically so you can analyze timely/frequently updated data. This talk was given by one of our instructors who teach our data science bootcamp!

R code, scripts, and supplemental items

There are many blogs and tutorials that teach you how to scrape data from a bunch of web pages once and then you’re done. But one-off web scraping is not useful for many applications that require sentiment analysis on recent or timely content, or capturing changing events and commentary, or analyzing trends in real time. As fun as it is to do an academic exercise of web scraping for one-off analysis on historical data, it is not useful to when wanting to use timely or frequently updated data.

You would like to tap into news sources to analyze the political events that are changing by the hour and people’s comments on these events. These events could be analyzed to summarize the key discussions and debates in the comments, rate the overall sentiment of the comments, find the key themes in the headlines, see how events and commentary change over time, and more. You need a collection of recent political events or news scraped every hour so that you can analyze these events.

What we’ll do:
We’ll go through the process of writing standard web scraping commands in R using rvest, filtering timely data, analyzing or summarizing key information in the text, and sending an email alert of the results of your analysis. We’ll set up our script to run every hour so that text is scraped and analyzed periodically to capture changing events and commentary, or analyze trends in real time.

Let’s go fetch your data!

Watch more community talks:
Data Manipulation with dplyr

More Data Science Material:
[Video Series] Web Scraping in R: Creating Your Automated Script
[Video] Introduction to Web Scraping in Python and Beautiful Soup
[Blog] Learn Web Scraping in 30 Minutes


Rebecca Merrett
About The Author
- Rebecca holds a bachelor’s degree of information and media from the University of Technology Sydney and a post graduate diploma in mathematics and statistics from the University of Southern Queensland. She has a background in technical writing for games dev and has written for tech publications.


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>