# Data Attributes (cont.) – Data Mining Fundamentals Part 3

January 6, 2017 3:00 am

We continue our discussion of data attributes and identifying the subsets of attribute classification. These subsets include: categorical, nominal, ordinal, interval and ratio.

All right, so within these two sort

of big categories of attributes, we

have some subsets that are also important to think about.

And one of the most important of these

is the distinction between categorical attributes

and non-categorical attributes.

So categorical attributes are discrete attributes

that specifically have a finite set of values

that they are allowed to take.

So for instance, so there’s several examples here.

And within categorical, there are two useful subsets.

So categorical values are any attribute,

categorical attributes are any attribute

that have only a finite set of values.

If that finite set of values has a natural ordering,

so this is something like rankings or grades or clothing

sizes, we call that an ordinal attribute.

So ordinal means that it has an order, pretty straightforward

linguistics there.

And ordinal attributes are nice, because we

can code them as integers and maintain

the ordering between them.

So we can, we don’t know how to treat them particularly

specially, but most categorical variables

are what we call nominal categorical variables

or attributes.

So nominal attributes have no inherent ordering to them.

So I color zip codes, ID numbers, hair color,

whether someone is married or not, or divorced,

or living with a partner.

There’s no way you can say oh yes, blue

should have a value of 5, and green should have a value of 2

because I don’t like green eyes.

There’s no ordering that you can put into those variables.

So nominal attributes in particular we have to handle,

we kind of have to be careful about handling.

Other useful types to think about in terms

of things that allow us, variable types that

allow us to treat them specially in ways that are useful,

that are easier.

On the continuous side are interval and ratio variables.

You can certainly have intervals or ratios that are discrete,

but for the most part, you see them as real, or as continuous.

Interval variables are a variable

where the measurement is a measurement, basically,

where the difference between two values

is constant and meaningful.

So for instance, with temperature, say,

temperature in Celsius, a temperature of 100 degrees

and a temperature of 90 degrees have the same difference

in heat between them as a heat of 80 degrees

and a heat of 90 degrees.

So interval variables are basically continuous variables

that have a nice metric we can assign them

that gives us some nice handling.

Something like the decibel scale, on the other hand,

is much harder to handle as an interval,

because the decibel scale, if you’re

thinking about the actual intensity of the sound,

it’s a logarithmic scale.

So the difference between three decibels and four decibels

is smaller than the difference between 13 and 14 decibels.

So that’s an example of a continuous variable that

isn’t an interval variable.

**Part 4**:

Basic Data Types

**Part 2**:

Data Attributes

**Complete Series**:

Data Mining Fundamentals

**More Data Science Material**:

[Video] Intro. to Azure ML: Renaming Columns and Replicating Data

[Blog] 101 Machine Learning Algorithms for Data Science with Cheat Sheets

(502)