Azure ML – Splitting Data & Categorical Casting

Azure ML – In this tutorial we will make sure all the categorical features are treated as categories using the edit meta data module. We will also setup a holdout dataset and randomly sample our dataset in two partitions, a training set and a test set.

Before we can feed this dataset into a machine learning model in Azure ML there are two things we have to take care of. First we have to make sure all the categorical features are treated as categories. We’ll use the edit meta data module once again to cast these features. Then we need to setup a holdout dataset for future evaluation of any model that we build. We will randomly sample our dataset into two partitions, a training set and a test set. The test set we will lock away to pretend that its future world data. The assumption is if the model we built can predict well on this test set, which it has never been exposed to before, it will do moderately just as well on the future world data.

You can get a free trial of Azure here.

Here is the link to the Azure Portal.

(415)

Category: Canonical Pages
Avatar
About The Author
- Data Science Dojo is a paradigm shift in data science learning. We enable all professionals (and students) to extract actionable insights from data.

Avatar

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>