Model Building – Introduction to Text Analytics with R Part 4

We are now ready to build our first model in RStudio and to do our model building, we will cover:

– Correcting column names derived from tokenization to ensure smooth model training.
– Using caret to set up stratified cross validation.
– Using the doSNOW package to accelerate caret machine learning training by using multiple CPUs in parallel.
– Using caret to train single decision trees on text features and tune the trained model for optimal accuracy.
– Evaluating the results of the cross validation process.

Kaggle Dataset:
Kaggle Spam Data Set

The data and R code here

Full Series:
Introduction to Text Analytics with R

More Data Science Material:
[Video] Automated Web Scraping Using rvest
[Blog]  R vs Python: Which is better for Data Science?


About The Author
- Data Science Dojo is a paradigm shift in data science learning. We enable all professionals (and students) to extract actionable insights from data.


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>