# Introduction to dplyr: Setup and Data Preparation

dplyr is a a great tool to perform data manipulation. It makes your data analysis process a lot more efficient. Even better, it’s fairly simple to learn and start applying immediately to your work! Oftentimes, with just a few elegant lines of code, your data becomes that much easier to dissect and analyze. For these reasons, it is an essential and foundational skill to master for any aspiring data scientist.

Often one may be surprised how some easy-to-learn functions can make the data analysis process that much more efficient. That is certainly the case with dplyr. In this series, we will teach you how to use this incredibly useful package to mung data, while demonstrating with a Kaggle dataset on wine ratings.

In Part 1 of this series, we will show you how:

– To obtain the R Programming Language

– Install RStudio

– Load in the wine ratings dataset from kaggle

– Install ggplot2 and dplyr packages

Introduction to R:

Tools needed:

Rstudio:

R Programming Language:

https://cloud.r-project.org/

Dataset:

https://www.kaggle.com/zynicide/wine-reviews/data

Dplyr Package:

install.packages(‘dplyer’)

Ggplot2 Package:

install.packages(‘ggplot2’)

Github:

https://github.com/datasciencedojo/tutorials

Be sure to also check our accompanying blog post here:

https://blog.datasciencedojo.com/explorations-with-dplyr/

(299)