# Evaluating Correlation – Data Mining Fundamentals Part 19

January 6, 2017 8:00 pm

Correlation and visually evaluating is the next step in our discussion on similarity and dissimilarity. Correlation measures the linear relationship between objects, and to visually evaluate correlation, you will need to build a scatter plot.

Another very common one, that I’m sure Ron in particular

is very familiar with, as a statistician, is correlation.

So correlation measures, essentially,

the linear relationship between the objects.

It tells us if object p and q move together,

is kind of the way to think about it.

So what we do with this is we standardize each

of the objects’ attributes.

And then we take their dot product.

And it gives us a value between 1 and negative 1–

so it’s not exactly a standard similarity measurement–

that we can square it and then it becomes between 0 and 1

and becomes a standard similarity measurement.

That’s sometimes called the coefficient of determination.

Sorry.

R is the coefficient of determination.

R squared is the correlation.

I don’t remember my statistics classes well enough.

I apologize.

The two tend to get used in data science very interchangeably.

So here, for those of you who haven’t

had that much statistics or who don’t remember,

is a visual example of our correlations.

So when correlation is negative 1,

which is the lowest possible value,

we have a very linear relationship.

As one object goes up, the other comes down,

whatever up and down happen to mean in this context.

And with a correlation of 1, we have

the objects are going up together or coming down

together.

And as we get to correlations that are closer to 0,

we can see that this data clearly

has very little relationship.

Whereas if we get closer to 1 and negative 1,

we see a sharper and sharper linear relationship

between the two.

Correlation is one of the metrics

that we use to evaluate regression models.

So we’ll talk about it more in that context.

But I just wanted to make sure we introduced it

so people had heard the word if you

haven’t had much of a statistics background,

or it’s been a while.

**Part 20**:

Evaluating Correlation

**Part 18**:

Euclidean Distance & Cosine Similarity

**Complete Series**:

Data Mining Fundamentals

**More Data Science Material**:

[Video] Event Log Mining with R

[Blog] High Dimensional Data: Breaking the Curse of Dimensionality with Python

(1891)

**Tags:**Data Mining