Ordered Data & Graph Data – Data Mining Fundamentals Part 6

We introduce graph data and ordered data. And discuss the types of ordered data such as spatial-temporal and genomic data.

So the next big category of data that we’ll talk about briefly
here is graph data.
So graph data– the classic example, of course, is HTML,
is the worldwide web, is graph that is defined by–
as a graph.
It’s defined by nodes, which are our vertices in our graph.
So every webpage is a node, and then
a set of edges, which point from one node to another.
And those edges can be one-directional like here,
or they can be bidirectional, here.
And then in addition to edges and nodes,
edges, in some graphs, have weight.
So in this case, this count for– if it’s an HTML website,
this might be a count of the number of times that website–
this website here links to this website here.
So it links five times here, but only two times here.
So when we’re dealing with graph data–
and we won’t talk about this in great detail,
because it’s sort of it’s own sub-problem
that we don’t have a lot of time to cover,
but it’s good to be aware of.
When you’re dealing with graph data,
you have to put a lot of thought into how
you capture the relationships between the nodes, how you
encode your edges and vertices.
We have to sort of–
you don’t get the same kind of neat,
you know, there are n attributes that
represent– that can be represented by n columns,
right?
Each vertice can have any number, anywhere
from 0 to and to an infinite, theoretically,
number of edges coming out of it.
So when you’re analyzing, doing that sort of analysis,
you have to handle it differently.
The last big category of data is ordered data.
Now, ordered data is data which has some sort of–
where each data object has to be ordered in some way.
So in the case of a genomic sequence,
for instance, the ordering of our ribosome
of our nucleic acids here, GGTTCC, et cetera,
is important, right?
The fact that we have GGTTCC here
is different than if we had had CCTT and then GG.
Those are different– those are fundamentally different
sequences, so we have to encode it
in some way that preserves that ordering.
Another example, and sort of your classic example
of ordered data is spatial and temporal data.
So this little gif here represents
the average monthly temperature of land–
of both lands and oceans over the course of a year.
So in this case, the spatial aspect of the data
is important.
Where we are in the world certainly
matters when we’re looking at a data object.
And in this case, if we were getting this data,
every row in, say, a database table might be–
might have a location associated with it and a time,
and there is an implicit ordering there, especially
to the time, but also the location.
So when we’re handling ordered data,
we have to be very careful about it.
And this is very important, because time series, of course,
anytime you thinking about doing any kind of sensing
in any kind of sensing material or anything like that,
you get time series data.
It’s the most common type of ordered data,
and we’ll talk during the boot camp a lot about–
during the back half of the boot camp,
especially, about how we handle time series data.

Part 7:
Data Quality

Part 5:
Transaction & Document Data

Complete Series:
Data Mining Fundamentals

More Data Science Material:
[Video] Feature Engineering for Bot Detection
[Blog]  Enhance your AI superpowers with Geospatial Visualization

(472)

Avatar
About The Author
- Data Science Dojo is a paradigm shift in data science learning. We enable all professionals (and students) to extract actionable insights from data.

Avatar

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>