Text Analytics Fundamentals | Introduction to Text Analytics with R Part 2

Text analytics fundamentals covers:

– The importance of splitting data in to training and test datasets
– Stratified sampling of imbalanced data using the caret package
– Representing text data for the purposes of machine learning
– Introduction to tokenization, stop words, and stemming
– The bag-of-words model
– Considerations for data pre-processing

Full Series:
Introduction to Text Analytics with R

Kaggle Dataset:
Kaggle Spam Data Set

The data and R code here

More Data Science Material:
[Video Series] Data Visualization with R and ggplot2
[Blog] R Language Programming for Excel Users


About The Author
- Data Science Dojo is a paradigm shift in data science learning. We enable all professionals (and students) to extract actionable insights from data.


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>