# Basic Data Types – Data Mining Fundamentals Part 4

January 6, 2017 4:00 am

Data types can be categorized into three set types, Record, Ordered, and Graph. In this tutorial, we will give you examples of when you would want to use each data set.

All right, we can move on to data set classification.

So data sets are–

there are a lot of different types of data sets.

And they require different approaches to analysis.

The pre-processing steps, the modeling steps,

pretty much everything that you do

with these different types of data sets

is going to be different.

The kinds of models you use, the kinds of visualizations

you construct, the kind of cleaning that

is proper for that kind of data.

Understanding the structure of your data at the beginning

is very important to not wasting time and not

producing incorrect results.

And it’s in this step, the understanding the structure

of your data that things like domain knowledge

tend to be very important.

But there are still, certainly, categories

that tend to be similar no matter what domain they’re in.

So we’ll talk about these three different kinds of types

of data sets, records, graphs, and ordered data sets,

in a little bit more detail coming up here.

So record data is data that consists

of a collection of records, each of which

consists of a fixed that of attributes.

So this tax ID.

So this particular data set, which I use in several places,

is a record data.

Every data object has one tax ID, has a value of whether they

asked for refund, marital status,

whether they’re single married or divorced,

a taxable income field, and whether they

cheated on their taxes or not.

So that’s what’s, sort of, the structure of this data set.

So any data, which consists of this kind of collection

of records, which consists of a fixed set of attributes,

you almost always represent this kind

of data as a table, whether a database

table, or a spreadsheet, or something like that.

And it’s the most common kind of data.

So a lot of people will, if you talk about data or data sets,

this is what they visualize, entirely, is record data.

So it’s, sort of, your most common and, sort of,

fundamental kind of data set.

So within record data, there are a few useful subsets.

So this record data, with the tax data,

has some categorical values and then one ordinal variable.

So tax ID is ordinal, right?

Or is it?

It’s really more of a nominal variable, when

you think about it, because ordering doesn’t necessarily

matter.

Right, sure, it takes numbers but 10

is not meaningfully different from five.

There’s no ordering implied here.

So tax ID is a nominal field.

Nominal categorical field.

Tax refund is a categorical field, marital status also,

taxable income is a continuous field.

So most data that you encounter has mixed data types like this.

You have some categorical, some numeric,

and that’s, sort of, your traditional type of record

data.

If, on the other hand, your record data consists entirely

of numeric attributes, so this is entirely continuous,

entirely interval, or ratio variables.

Then we can think of it as a mathematical matrix rather than

just a table.

So we would have an m by n matrix.

There are m rows, one for each data object

and columns, one for each attribute.

And this is nice because we can think of these data objects

as points in a multi-dimensional space,

where each attribute is represented

along one dimension.

And that allows us to use a number of numeric techniques,

specifically, involving distance that some algorithms,

not only make some algorithms easier,

but which some algorithms require.

There’s a number of algorithms that

require you to have data matrix data, all numeric data.

**Part 5**:

Document & Transaction Data

**Part 3**:

Data Attributes (cont.)

**Complete Series**:

Data Mining Fundamentals

**More Data Science Material**:

[Video] Building data science products? Think business first!

[Blog] Getting Started with Kaggle Competitions

(656)