Summary Statistics – Data Mining Fundamentals Part 20

Summary statistics are numbers that summarize properties of data, and the frequency of an attribute value is a percentage measuring how often the value occurs in the data set. We will also describe percentiles, and provide examples of each.

All right.
So last, but very certainly not least,
is data exploration and visualization.
Data exploration and visualization
are critically important to the practice of data science.
In fact, we’re going to spend the vast majority
of the first day of the boot camp
talking almost exclusively about data exploration
and visualization because it’s just that important.
You need to understand what your data looks like before you
can start to model it properly.
So what is data exploration?
Essentially, data exploration is visualization and calculation
that allows us to better understand the characteristics
of a dataset.
The key motivations of it are that we
want to be sure we select the right tools for preprocessing
and analysis.
And because it uses our human mind’s really, really powerful
ability to recognize patterns.
A person will recognize a pattern that a data analysis
tool won’t in a lot of context.
Building a neural network, which will tell you
if a picture is of a face, is a massive endeavor.
It’s a very complicated endeavor.
But humans can do it.
Most humans can do it innately, automatically,
very, very quickly.
So this is, of course, related to the historical phrase
of exploratory data analysis, EDA.
The original book is Exploratory Data Analysis by John Tukey.
And if you’re interested in data exploration, specifically,
there’s some information here.
And this will, of course, be online shortly,
so you can pull that off more quickly.
The original focus of the field of EDA
is not the same as our focus as data scientists.
As data scientists, our focus is on summary statistics
and visualization.
And EDA, clustering and anomaly detection, so I think,
Ron, you have some background in this field, I suspect,
because you’re talking about Natalie as well.
Using clustering as exploratory techniques.
Anomaly detection as exploratory techniques.
In our context, now clustering and anomaly detection
are major areas of data science interest, major fields,
sub-fields of their own, not just a piece of an exploratory.
Though, clustering for exploratory purposes is still
used a great deal.
It’s actually– good clustering algorithms and good clustering
practice is one of your more powerful tools
if you have a very complicated dataset.

Part 21:
Data Visualization & Exploration

Part 19:
Evaluating Correlation

Complete Series:
Data Mining Fundamentals

More Data Science Material:
[Video] Unstructured Text With Python, MS Cognitive Services & PowerBI
[Blog] A Comprehensive Tutorial on Classification using Decision Trees


About The Author
- Data Science Dojo is a paradigm shift in data science learning. We enable all professionals (and students) to extract actionable insights from data.


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>