Your First Test – Introduction to Text Analytics with R Part 11

Your First Test includes specific coverage of:

– Pre-processing new, unseen textual data to allow for predictions from our trained model.
– The importance of caching the IDF values calculated from the training data set to TF-IDF new, unseen, pre-processed data.
– Performing SVD projections of new, unseen, pre-processed textual data into the latent semantic space.
– Creating predictions and evaluating model effectiveness in the context of accuracy, sensitivity, and specificity.

Kaggle Dataset:
Kaggle Spam Data Set

The data and R code here

Full Series:
Introduction to Text Analytics with R

More Data Science Material:
[Video] Steps in Experimentation
[Blog] Text Mining: Breathing Structure to the Unstructured

(442)

Avatar
About The Author
- Data Science Dojo is a paradigm shift in data science learning. We enable all professionals (and students) to extract actionable insights from data.

Avatar

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>