# Ordered Data & Graph Data – Data Mining Fundamentals Part 6

January 6, 2017 6:00 am

We introduce graph data and ordered data. And discuss the types of ordered data such as spatial-temporal and genomic data.

So the next big category of data that we’ll talk about briefly

here is graph data.

So graph data– the classic example, of course, is HTML,

is the worldwide web, is graph that is defined by–

as a graph.

It’s defined by nodes, which are our vertices in our graph.

So every webpage is a node, and then

a set of edges, which point from one node to another.

And those edges can be one-directional like here,

or they can be bidirectional, here.

And then in addition to edges and nodes,

edges, in some graphs, have weight.

So in this case, this count for– if it’s an HTML website,

this might be a count of the number of times that website–

this website here links to this website here.

So it links five times here, but only two times here.

So when we’re dealing with graph data–

and we won’t talk about this in great detail,

because it’s sort of it’s own sub-problem

that we don’t have a lot of time to cover,

but it’s good to be aware of.

When you’re dealing with graph data,

you have to put a lot of thought into how

you capture the relationships between the nodes, how you

encode your edges and vertices.

We have to sort of–

you don’t get the same kind of neat,

you know, there are n attributes that

represent– that can be represented by n columns,

right?

Each vertice can have any number, anywhere

from 0 to and to an infinite, theoretically,

number of edges coming out of it.

So when you’re analyzing, doing that sort of analysis,

you have to handle it differently.

The last big category of data is ordered data.

Now, ordered data is data which has some sort of–

where each data object has to be ordered in some way.

So in the case of a genomic sequence,

for instance, the ordering of our ribosome

of our nucleic acids here, GGTTCC, et cetera,

is important, right?

The fact that we have GGTTCC here

is different than if we had had CCTT and then GG.

Those are different– those are fundamentally different

sequences, so we have to encode it

in some way that preserves that ordering.

Another example, and sort of your classic example

of ordered data is spatial and temporal data.

So this little gif here represents

the average monthly temperature of land–

of both lands and oceans over the course of a year.

So in this case, the spatial aspect of the data

is important.

Where we are in the world certainly

matters when we’re looking at a data object.

And in this case, if we were getting this data,

every row in, say, a database table might be–

might have a location associated with it and a time,

and there is an implicit ordering there, especially

to the time, but also the location.

So when we’re handling ordered data,

we have to be very careful about it.

And this is very important, because time series, of course,

anytime you thinking about doing any kind of sensing

in any kind of sensing material or anything like that,

you get time series data.

It’s the most common type of ordered data,

and we’ll talk during the boot camp a lot about–

during the back half of the boot camp,

especially, about how we handle time series data.

**Part 7**:

Data Quality

**Part 5**:

Transaction & Document Data

**Complete Series**:

Data Mining Fundamentals

**More Data Science Material**:

[Video] Feature Engineering for Bot Detection

[Blog] Enhance your AI superpowers with Geospatial Visualization

(472)