Mean Absolute Error for Forecast Evaluation: Time Series in Python Part 3

In part 3 of this video series, learn how to evaluate time series model predictions using mean absolute error and Python’s statistics and matplotlib packages. We look at plotting the differences between actual versus predicted values, and calculate the mean absolute error to help evaluate our ARIMA time series model. We also look at potential issues when modeling time series, and how to take this further and learn more in-depth. This series is considered for intermediate and advanced users. We have a data science bootcamp for complete beginners!

Hi, welcome back to this Data Science Dojo video tutorial series on time series.
In part two we left it at modeling our data and predicting five
timestamps ahead into the future. In part three we’ll evaluate our predictions and
see how far off the month they were to the actual values in our holdout data, or
in the last five timestamps of our full sample data set.
So now we’re going to plot actual versus predicted.
We’re going to get two versions of our time series
so we’re going to have all our values with the last five being actual values
and then we’re going to overlay the plot with all the values again but with the
last five being predicted values, and we should see some difference between those
actual and predicted values and the last five timestamps.
So first we’re going to read in our entire sample which includes our last five values as our actual values.
And once again we’re going to use pandas read csv function.
Gonna read in our full sample, or entire dataset.
And once again, we’ll use our date time column as our index column.
Which is the first column.
And we will parse these dates.
We’ll use this squeeze option to return a series.
Now, I want to print the row values, or the index
values, of the last of our last five values or a holdout set, as we’re going to
input these into another series. So the way to get this, we’ll just call index for those values.
And we’re going to get the last five here for our actual.
I’m gonna get the index values for these starting at 19, going to 23, 24.
And we’re going to print these out, so we can have a look as well.
Okay, let’s have a look at these.
Alright, so these values here is basically the time stamps for our
holdout set. It’s going to input these into another series with our prediction
values. I’m going to tie our predicted values to each of their time stamps.
So another way you can read in a time series is using this series function
here, we’re wanting to read from a CSV before, but you can do it this way as well.
Give it our predicted values, and we’re gonna create row index.
I’m gonna paste in these values here, just so you can see the last five timestamps.
But you can just feed it that, you know, index for values variable.
I’ll clean these up a bit.
Okay, great. And let’s print this just to make sure that it is in a correct format.
Alright, let’s have a look.
Okay, great. So we have our predicted values tied to
their time stamps now in a series, and what we’re going to do is append that on
to our training set, so we have, as I said, one version of our series with the
predicted values, and one version with the actual. So let’s go ahead and do that.
You can comment these out, as we no longer need to print them.
And I’m just going to print the tail end of this, just to make sure it appended
onto the end of the drawing set.
Okay, let’s have a look here.
Okay, great. So it looks like it successfully appended onto the training
set here. So now we have a full series with predicted values and a full series with actual.
Okay, now let’s plot the actual versus predicted.
I’m going to create a plot here.
We’ll start with our predictive values, and I’m just gonna plot them in the color orange.
And I’m gonna give it a label so I can add a legend later.
And I’m gonna do the same for actual, obviously.
And I’ll just color this a different color, so maybe blue.
And I’ll also give it a label.
And I’m also going to create a legend for this, so we can differentiate these lines.
I’ll just place it in the upper left, it’s pretty reasonable location.
Okay, let’s have a look at our actual versus predicted.
See if it was way off the mark or not.
Okay, so having a look at this, the predicted kind of
follows the same kind of general downward pattern as the actual.
It’s quite off the mark here, but we can’t tell exactly how far off the mark.
So we need to calculate the mean absolute error as a way of seeing how big are
these differences between actual and predicted, so let’s go ahead and do that.
So I’ll comment these out.
And we’re going to calculate the mean absolute error to evaluate the model and
see if there’s a big difference between actual values and the predicted values.
And average over these. So first of all we’ll get our actual values and our holdout set.
And we’ll just get the index starting at 19, ending 23.
So our last five values of our holdout set.
We’ll do the same for predicted.
Okay, great. Now we’re going to basically go through and compare each value
so we’re going to take the first
actual value, and minus the first predicted value and then we’ll take the
second actual value and minus the second predicted value and so on and so forth.
And so we’re going to have all these
values over the differences between the two, we’re going to store them in an array
called prediction errors. And then at the end of that we’re just going to average
over their absolute values to get an idea of, you know, the mean absolute error
or the overall error rate here.
So, for example, you can take the first actual value, minus the first predictive value.
And we’re going to pin that onto our predictions error array.
And we want to have a quick look at these differences. See if they’re quite big or not.
Alright, let’s have a look at these.
Between tabbing and having four spaces, the war begins.
We use four spaces in this instance. Just make sure that’s consistent because Python is
kind of a language that kind of has these issues all the time.
So let’s run this again.
Okay, so here are our differences between actual and predicted.
So they don’t seem too bad. In some cases they might be quite far off the mark,
considering that we have values that go six places after the decimal point.
Zero point two, zero point two five might be quite big of a difference.
But the way to really judge this is to average over them their absolute values
Okay, we’ll store it in the variable called mean absolute error and we’re
going to obviously get the mean first and use the statistics package for this.
Look at the mean of the absolute values. And that’s pretty much it.
And the absolute values of our prediction errors.
And we obviously want to print this, so let’s have a look at it.
Okay, so our mean absolute error is about 0.02, so it’s here.
So that basically means that it’s off the mark for about 0.02, so it’ll
We have to be underestimating or overestimating, but considering, as I said,
like, there’s six values past this decimal point, maybe this is quite a big difference.
Maybe it’s not too big of a deal. It’s something that we need to consider here.
You’d have to think of this and decide whether you would accept this model as it is.
There are a few problems to be aware of in this model. For one, the
data might be not entirely stationary, so even though it looked fairly
stationary to our judgement when we were plotting it before, a test would help
better determine this. So what we could do is use the augmented dickey-fuller
test to check if those two rounds of differences that we did resulted in
a stationary data or not.
So let’s have a look here and see why we’re getting a relatively big mean absolute error.
And we’re going to print the p-value for this test, so if the p-value is greater
than 0.05, which is our significance level, we’ll accept the null hypothesis
as the data is non-stationary. And if it’s less than or equal to 0.05 we’re
going to reject that null hypothesis and say that the data is stationary.
So if we want it to be stationary, we want to see it less than or equal to 0.05.
Let’s see if this is the case.
Okay, let’s print this and have a look.
Okay, so we probably wouldn’t accept the model as it is because it’s confirmed that we have
stationary issues with our data, it’s not completely stationary yet.
So this could be a reason why it’s a bit off the mark. So then we need to look at better
transforming this data. One way you could do this is you could look at say
stabilizing the variance by applying maybe the cube root which can take into
account negative and positive values. And then you can difference the data.
You might also want to compare models with different AR and MA terms, so remember
when we printed the summary of our model and there were some terms that weren’t
really significant enough to be included in the model, maybe you look at running a
model just with one MA term and see if that makes a difference to the results.
Also, another thing to consider is, this is a very small sample size of
only 24 timestamps in our entire dataset, 19 in our train set.
There might not be enough data to spare for a holdout set. So then to get more out of
your data for training, you could look at rolling over time series or time stamps
at a time for different holdout sets. And this allows you to train on more time
stamps, so it doesn’t stop the model from capturing the last chunk of time stamps
stored in a single holdout set.
Another thing is that the data only looks at 24 hours in one day.
I mean, would we stop to capture more of a trend in hourly sentiment if we collected data over several days?
How would you go about collecting more data?
So that’s something else to think about.
So what I would like you to do now, is take on this challenge and further improve on this model.
So you’ve been given a head start, now I want you to take this example and improve on it.
Sometimes we get into the habit of just following along and
copying what somebody else is doing, but I want you to think critically about
this, and think about some of the issues that we talked about and how you can take this further.
To study time series further, you also need to understand
things like model diagnostics, using the AIC to search for best model parameters.
You need to be able to handle any daytime data issues. You might want to
try other modeling techniques.
So, time series is something that we plan to introduce in
Data Science Dojo’s post bootcamp material, but you can learn more during a
short sort of intense bootcamp. We cover some key machine learning algorithms and
techniques, and we take you through the critical thinking process behind many
data science tasks. You can check out the curriculum below this video.
But keep fine-tuning and keep practicing.
Thanks for watching.
if you found this video tutorial useful, give us a like.
Otherwise, you can check out our other videos at

Watch Part 1:
Read and Transform your Data: Time Series in Python

Watch Part 2:
ARIMA modeling and forecasting: Time Series in Python

Code, R & Python Script Repository

Packages Used:

More Data Science Material:
[Video] Getting started with Python and R for Data Science
[Video] Web scraping in Python and Beautiful Soup
[Blog] Breaking the Curse of Dimensionality with Python


Rebecca Merrett
About The Author
- Rebecca holds a bachelor’s degree of information and media from the University of Technology Sydney and a post graduate diploma in mathematics and statistics from the University of Southern Queensland. She has a background in technical writing for games dev and has written for tech publications.



You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>