Model Building – Introduction to Text Analytics with R Part 4
We are now ready to build our first model in RStudio and to do our model building, we will cover:
– Correcting column names derived from tokenization to ensure smooth model training.
– Using caret to set up stratified cross validation.
– Using the doSNOW package to accelerate caret machine learning training by using multiple CPUs in parallel.
– Using caret to train single decision trees on text features and tune the trained model for optimal accuracy.
– Evaluating the results of the cross validation process.
Kaggle Spam Data Set
The data and R code here
Introduction to Text Analytics with R