# Transaction Data & Document Data – Data Mining Fundamentals Part 5

January 6, 2017 5:00 am

Transaction data is record data where each record involves a set of items, we will discuss how it works in data mining. We will also discuss another useful subcategory of record data, document data.

So another useful sort of subcategory of record data

is document data.

So in this case, it kind of is somewhat

similar to a data matrix.

Every term, every entry, every data attribute

has a numeric value.

But in this case, we’ve got counts,

we’ve got discrete values.

So in this case, what we have here is each row, each data

object, is represented by what we

think of as what we call a term vector.

So this term vector in this case and there’s

several ways you can do it, but in this case,

it just counts the number of times

a given word appears in the document.

So document 1 has team appear three times, play appear five,

but coach appear none.

Document 2, on the other hand, has coach appear seven times,

but never has play appear over the course of the document.

So because these attributes are all discrete,

because they’re all integer attributes,

we can do different kinds of things,

different kinds of algorithms and processing methods

are more appropriate than data matrices or mixed data is.

All right, so the last special kind

of record data that we’re going to talk about here

is transaction data.

So this shares some similarities to document data.

And you can use some of the same analysis.

But there’s different semantics around it as well.

So transaction data is exactly what it sounds like.

It’s record data where each record involves a set of items.

So if we’re at a grocery store, the set

of products purchased by a customer

during one shopping trip constitutes a transaction.

And the individual products that were purchased are the items.

So the difference between this and document data

is that usually these items have more information than just

a count associated with them.

So not only is it bread, there’s a price associated with that,

there’s maybe an inventory stock associated

with that, how many are left, all of those sorts of things.

So we can do sort of things similar to document analysis,

but there’s other sorts of information

we have to consider as well.

So that’s transaction data.

**Part 6**:

Graph & Ordered Data

**Part 4**:

Basic Data Types

**Complete Series**:

Data Mining Fundamentals

**More Data Science Material**:

[Video] Business Data Analysis with Excel

[Blog] Time Series – the Quintillion Business Applications You Forgot About

(520)