Transaction Data & Document Data – Data Mining Fundamentals Part 5

Transaction data is record data where each record involves a set of items, we will discuss how it works in data mining. We will also discuss another useful subcategory of record data, document data.

So another useful sort of subcategory of record data
is document data.
So in this case, it kind of is somewhat
similar to a data matrix.
Every term, every entry, every data attribute
has a numeric value.
But in this case, we’ve got counts,
we’ve got discrete values.
So in this case, what we have here is each row, each data
object, is represented by what we
think of as what we call a term vector.
So this term vector in this case and there’s
several ways you can do it, but in this case,
it just counts the number of times
a given word appears in the document.
So document 1 has team appear three times, play appear five,
but coach appear none.
Document 2, on the other hand, has coach appear seven times,
but never has play appear over the course of the document.
So because these attributes are all discrete,
because they’re all integer attributes,
we can do different kinds of things,
different kinds of algorithms and processing methods
are more appropriate than data matrices or mixed data is.
All right, so the last special kind
of record data that we’re going to talk about here
is transaction data.
So this shares some similarities to document data.
And you can use some of the same analysis.
But there’s different semantics around it as well.
So transaction data is exactly what it sounds like.
It’s record data where each record involves a set of items.
So if we’re at a grocery store, the set
of products purchased by a customer
during one shopping trip constitutes a transaction.
And the individual products that were purchased are the items.
So the difference between this and document data
is that usually these items have more information than just
a count associated with them.
So not only is it bread, there’s a price associated with that,
there’s maybe an inventory stock associated
with that, how many are left, all of those sorts of things.
So we can do sort of things similar to document analysis,
but there’s other sorts of information
we have to consider as well.
So that’s transaction data.

Part 6:
Graph & Ordered Data

Part 4:
Basic Data Types

Complete Series:
Data Mining Fundamentals

More Data Science Material:
[Video] Business Data Analysis with Excel
[Blog] Time Series – the Quintillion Business Applications You Forgot About


About The Author
- Data Science Dojo is a paradigm shift in data science learning. We enable all professionals (and students) to extract actionable insights from data.


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>