Introduction to Machine Learning with R and caret

The R programming language is experiencing rapid increases in popularity and wide adoption across industries. This popularity is due, in part, to R’s huge collection of open source machine learning algorithms. If you are a data scientist working with R, the caret package (short for [C]lassification [A]nd [RE]gression [T]raining) is a must-have tool in your toolbelt. The caret package provides capabilities that are ubiquitous in all stages of the data science project lifecycle. Most important of all, caret provides a common interface for training, tuning, and evaluating more than 200 machine learning algorithms. Not surprisingly, caret is a sure fire way to accelerate your velocity as a data scientist!

In this presentation Dave Langer will provide an introduction to the caret package. The focus of the presentation will be using caret to implement some of the most common tasks of the data science project lifecycle and to illustrate incorporating caret into your daily work.

Attendees will learn how to:

• Create stratified random samples of data useful for training machine learning models.
• Train machine learning models using caret’s common interface.
• Leverage caret’s powerful features for cross-validation and hyperparameter tuning.
• Scale caret via use of multi-core, parallel training.
• Increase their knowledge of caret’s many features.

R code and accompanying dataset:

caret website:


About The Author
- Data Science Dojo is a paradigm shift in data science learning. We enable all professionals (and students) to extract actionable insights from data.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>