Summary Statistics & Cleaning Missing Data | Azure ML Tutorial

Summary Statistics & Cleaning Missing Data – Let’s understand the aggregate behavior of our features further by looking at summary statistics. Azure Machine Learning gives us easy access to mean, median, mode, min, and max. Let’s look at each measure to see what it means to the interpretation of the data.

The summarize data module also gives us a count for each feature with missing values. We can then formulate a strategy for cleaning missing data. The cleaning functions used in this tutorial is not the optimal way to clean data, but we must learn to crawl before we walk. We’ll drop each row that has a missing value in our response class. Then use one of the measures of central tendency to fill in the other features; median for numeric features and mode for categorical features.

You can get a free trial of Azure here.

Here is the link to the Azure Portal.

Watch the next video in this series

(569)

Phuc H Duong
About The Author
- Phuc holds a Bachelors degree in Business with a focus on Information Systems and Accounting from the University of Washington.

1 Comment

Avatar

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>