Data Attributes (cont.) – Data Mining Fundamentals Part 3

We continue our discussion of data attributes and identifying the subsets of attribute classification. These subsets include: categorical, nominal, ordinal, interval and ratio.

All right, so within these two sort
of big categories of attributes, we
have some subsets that are also important to think about.
And one of the most important of these
is the distinction between categorical attributes
and non-categorical attributes.
So categorical attributes are discrete attributes
that specifically have a finite set of values
that they are allowed to take.
So for instance, so there’s several examples here.
And within categorical, there are two useful subsets.
So categorical values are any attribute,
categorical attributes are any attribute
that have only a finite set of values.
If that finite set of values has a natural ordering,
so this is something like rankings or grades or clothing
sizes, we call that an ordinal attribute.
So ordinal means that it has an order, pretty straightforward
linguistics there.
And ordinal attributes are nice, because we
can code them as integers and maintain
the ordering between them.
So we can, we don’t know how to treat them particularly
specially, but most categorical variables
are what we call nominal categorical variables
or attributes.
So nominal attributes have no inherent ordering to them.
So I color zip codes, ID numbers, hair color,
whether someone is married or not, or divorced,
or living with a partner.
There’s no way you can say oh yes, blue
should have a value of 5, and green should have a value of 2
because I don’t like green eyes.
There’s no ordering that you can put into those variables.
So nominal attributes in particular we have to handle,
we kind of have to be careful about handling.
Other useful types to think about in terms
of things that allow us, variable types that
allow us to treat them specially in ways that are useful,
that are easier.
On the continuous side are interval and ratio variables.
You can certainly have intervals or ratios that are discrete,
but for the most part, you see them as real, or as continuous.
Interval variables are a variable
where the measurement is a measurement, basically,
where the difference between two values
is constant and meaningful.
So for instance, with temperature, say,
temperature in Celsius, a temperature of 100 degrees
and a temperature of 90 degrees have the same difference
in heat between them as a heat of 80 degrees
and a heat of 90 degrees.
So interval variables are basically continuous variables
that have a nice metric we can assign them
that gives us some nice handling.
Something like the decibel scale, on the other hand,
is much harder to handle as an interval,
because the decibel scale, if you’re
thinking about the actual intensity of the sound,
it’s a logarithmic scale.
So the difference between three decibels and four decibels
is smaller than the difference between 13 and 14 decibels.
So that’s an example of a continuous variable that
isn’t an interval variable.

Part 4:
Basic Data Types

Part 2:
Data Attributes

Complete Series:
Data Mining Fundamentals

More Data Science Material:
[Video] Intro. to Azure ML: Renaming Columns and Replicating Data
[Blog] 101 Machine Learning Algorithms for Data Science with Cheat Sheets

(502)

Avatar
About The Author
- Data Science Dojo is a paradigm shift in data science learning. We enable all professionals (and students) to extract actionable insights from data.

Avatar

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>