Summary statistics are numbers that summarize properties of data, and the frequency of an attribute value is a percentage measuring how often the value occurs in the data set. We will also describe percentiles, and provide examples of each.
So last, but very certainly not least,
is data exploration and visualization.
Data exploration and visualization
are critically important to the practice of data science.
In fact, we’re going to spend the vast majority
of the first day of the boot camp
talking almost exclusively about data exploration
and visualization because it’s just that important.
You need to understand what your data looks like before you
can start to model it properly.
So what is data exploration?
Essentially, data exploration is visualization and calculation
that allows us to better understand the characteristics
The key motivations of it are that we
want to be sure we select the right tools for preprocessing
And because it uses our human mind’s really, really powerful
ability to recognize patterns.
A person will recognize a pattern that a data analysis
tool won’t in a lot of context.
Building a neural network, which will tell you
if a picture is of a face, is a massive endeavor.
It’s a very complicated endeavor.
Most humans can do it innately, automatically,
So this is, of course, related to the historical phrase
of exploratory data analysis, EDA.
The original book is Exploratory Data Analysis by John Tukey.
And if you’re interested in data exploration, specifically,
there’s some information here.
And this will, of course, be online shortly,
so you can pull that off more quickly.
The original focus of the field of EDA
is not the same as our focus as data scientists.
As data scientists, our focus is on summary statistics
And EDA, clustering and anomaly detection, so I think,
Ron, you have some background in this field, I suspect,
because you’re talking about Natalie as well.
Using clustering as exploratory techniques.
Anomaly detection as exploratory techniques.
In our context, now clustering and anomaly detection
are major areas of data science interest, major fields,
sub-fields of their own, not just a piece of an exploratory.
Though, clustering for exploratory purposes is still
It’s actually– good clustering algorithms and good clustering
practice is one of your more powerful tools
if you have a very complicated dataset.
Data Visualization & Exploration
Data Mining Fundamentals
More Data Science Material:
[Video] Unstructured Text With Python, MS Cognitive Services & PowerBI
[Blog] A Comprehensive Tutorial on Classification using Decision Trees