A confusion matrix, also known as an error matrix, uses a special table to help visualize the performance of your classification model. That way, you can easily see how successful your model was when predicting the class. In this introduction, we give you a brief overview of what a confusion matrix is, how to create your matrix, and why you should use it.

Topics include: true positives and negatives, target classes, and predictive models.

Welcome to this quick introduction to the confusion matrix. If you’ve ever

looked at a confusion matrix for the first time, you’ve might have found it,

well, confusing. But it doesn’t have to be. A confusion matrix is a simple way to

lay out how many predicted categories or classes were correctly predicted and how

many were not. It is used to evaluate the results of a predictive model with a

class outcome to see the number of classes that were correctly predicted as

their true class. To understand what’s going inside this confusion matrix of

correct classes versus incorrect classes, we first need to understand true

positives, true negatives, false positives and false negatives. Kind of confusing,

right? Well let’s relabel these terms to make it a bit clearer. Essentially, the

confusion matrix is just keeping track of Class A correctly predicted as Class

A, Class B correctly predicted as Class B, Class A incorrectly predicted as Class B

and Class B incorrectly predicted as Class A. Where true and false comes into

it, is we want to know if our target Class A was correctly predicted as A

which is true, or incorrectly predicted as B when in fact was A, which is false.

Our target Class A is our positive and the other Class B is our negative, so

then a true positive and a true negative is a positive Class A correctly

predicted as Class A, and negative Class B correctly predicted as B. We want to

get as many predictions of A and B as possible, aiming for more trues rather

than falses. So then how do we organize this in a way to lay out the number of

correct A’s and B’s versus incorrect A’s and B’s? Well, we draw a

grid. We place these into a matrix grid where the x-axis is the predictions made

and the y-axis is the actual class label. So let’s just say we have 200 subjects

of which 100 are from Class A and 100 from class B. 60 of the actual A cases on

the y-axis were correctly predicted as their true class, A on the x-axis.

For class B, 30 actual B classes on the y-axis were truly predicted as Class B

on the x-axis. If you look at the diagonal counts, that’s how many subjects

were correctly predicted as their classes. So these are all the trues for

the positive and negative classes. So now you have a way of identifying

which class is predicted correctly most of the time compared to

other classes. And evaluate whether your predictive model is, you know, guessing

right most of the time, or is it guessing wrong on each of these class. One last

thing, how do we decide what is Class A and what is Class B? What should be the

positive class, and what should be the negative class? Well, most of the time it

doesn’t matter which class you assign to positive or negative, as the confusion

matrix would tell you how many subjects were correctly, you know, predicted from

each class. But here are some examples of a target class you might want to

differentiate from a non target class. In standard binary classification you could

be interested in returning customers as, you know, the positive target class

versus new customers as the negative class. Or it could be one target class

versus all other classes, such as aggressive cancer versus all other

passive type cancers as the single negative class. Either way, you’re

comparing how many were correctly or incorrectly classified from each class.

The confusion matrix will tell you how many times actual class a was predicted

as B and vice versa, or if they were correctly, you know, classed as their true

labels. And that sums up the confusion matrix. Thanks for watching,

give us a like if you found it useful, or you can check out our other

video tutorials at tutorials.datasciencedojo.com

**Learn more about Classification Models:**

Introduction to Classification Models

Precision, Recall and F1 in Classification

One Versus One vs. One Versus All in Classification

**Complete Series**:

Data Science in Minutes

**More Data Science Material:**

[Video] Introduction to Big Data, Data Science and Predictive Analytics

[Video] Predictive Modeling with R and Azure ML

[Blog] Azure ML Tutorial – Build a Predictive Model

(502)