Summary Statistics & Cleaning Missing Data | Azure ML Tutorial Part 8

Summary Statistics & Cleaning Missing Data – Let’s understand the aggregate behavior of our features further by looking at summary statistics. Azure Machine Learning gives us easy access to mean, median, mode, min, and max. Let’s look at each measure to see what it means to the interpretation of the data.

The summarize data module also gives us a count for each feature with missing values. We can then formulate a strategy for cleaning missing data. The cleaning functions used in this tutorial is not the optimal way to clean data, but we must learn to crawl before we walk. We’ll drop each row that has a missing value in our response class. Then use one of the measures of central tendency to fill in the other features; median for numeric features and mode for categorical features.

You can get a free trial of Azure here.

Here is the link to the Azure Portal.

Part 9:
Splitting Data & Categorical Casting 

Part 7:
Dropping and Selecting Columns 

Complete Series:
Introduction to Azure Machine Learning

More Data Science Learning Material:
[Video] Unstructured Text With Python, MS Cognitive Services & PowerBI
[Blog] Math for Machine Learning: Math for Aspiring Data Scientists


Phuc H Duong
About The Author
- Phuc holds a Bachelors degree in Business with a focus on Information Systems and Accounting from the University of Washington.


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>