What is MAPE?

Mean Absolute Percentage Error (MAPE) is a statistical measure to define the accuracy of a machine learning algorithm on a particular dataset.

MAPE can be considered as a loss function to define the error termed by the model evaluation. Using MAPE, we can estimate the accuracy in terms of the differences in the actual v/s estimated values.

Let us have a look at the below interpretation of Mean Absolute Percentage Error–


As seen above, in MAPE, we initially calculate the absolute difference between the Actual Value (A) and the Estimated/Forecast value (F). Further, we apply the mean function on the result to get the MAPE value.

MAPE can also be expressed in terms of percentage. Lower the MAPE, better fit is the model.


Mean Absolute Percentage Error with NumPy module

Let us now implement MAPE using Python NumPy module.

At first, we have imported the dataset into the environment. You can find the dataset here.

Further, we have split the dataset into training and testing datasets using the Python train_test_split() function .

Then, we have defined a function to implement MAPE as follows–

  • Calculate the difference between the actual and the predicted values.
  • Then, use numpy.abs() function to find the absolute value of the above differences.
  • Finally, apply numpy.mean() function to get the MAPE.


import numpy as np
from sklearn.model_selection import train_test_split
import pandas as pd
bike = pd.read_csv(“Bike.csv”)

#Separating the dependent and independent data variables into two data frames.
X = bike.drop([‘cnt’],axis=1)
Y = bike[‘cnt’]

# Splitting the dataset into 80% training data and 20% testing data.
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.20, random_state=0)

#Defining MAPE function
def MAPE(Y_actual,Y_Predicted):
mape = np.mean(np.abs((Y_actual – Y_Predicted)/Y_actual))*100
return mape

Now, we have implemented a Linear Regression to check the error rate of the model using MAPE.

Here, we have made use of LinearRegression() function to apply linear regression on the dataset. Further, we have used the predict() function to predict the values for the testing dataset.

At last, we have called the MAPE() function created above to estimate the error value in the predictions as shown below:

#Building the Linear Regression Model
from sklearn.linear_model import LinearRegression
linear_model = LinearRegression().fit(X_train , Y_train)

#Predictions on Testing data
LR_Test_predict = linear_model.predict(X_test)

# Using MAPE error metrics to check for the error rate and accuracy level
LR_MAPE= MAPE(Y_test,LR_Test_predict)
print(“MAPE: “,LR_MAPE)


MAPE: 16.628873360270358

Tips For Using Regression Metrics

  • We always need to make sure that the evaluation metric we choose for a regression problem does penalize errors in a way that reflects the consequences of those errors for the business, organizational, or user needs of our application.
  • If there are outliers in the data, they can have an unwanted influence on the overall R2 or MSE scores. MAE is robust to the presence of outliers because it uses the absolute value. Hence, we can use the MAE score if ignoring outliers is important to us.
  • MAE is the best metrics when we want to make a distinction between different models because it doesn’t reflect large residuals.
  • If we want to ensure that our model takes the outliers into account more, we should use the MSE metrics.

2. Mean Squared Error or MSE

MSE is calculated by taking the average of the square of the difference between the original and predicted values of the data.

Hence, MSE =

Here N is the total number of observations/rows in the dataset. The sigma symbol denotes that the difference between actual and predicted values taken on every i value ranging from 1 to n.

This can be implemented using sklearn’s mean_squared_error method:

from sklearn.metrics import mean_squared_error

actual_values = [3, -0.5, 2, 7] predicted_values = [2.5, 0.0, 2, 8]

mean_squared_error(actual_values, predicted_values)

In most of the regression problems, mean squared error is used to determine the model’s performance.

4. R Squared

It is also known as the coefficient of determination. This metric gives an indication of how good a model fits a given dataset. It indicates how close the regression line (i.e the predicted values plotted) is to the actual data values. The R squared value lies between 0 and 1 where 0 indicates that this model doesn’t fit the given data and 1 indicates that the model fits perfectly to the dataset provided.

import numpy as np

X = np.random.randn(100)
y = np.random.randn(60) # y has nothing to do with X whatsoever

from sklearn.linear_model import LinearRegression
from sklearn.cross_validation import cross_val_score

scores = cross_val_score(LinearRegression(), X, y,scoring=’r2′)

Hands-On Example of Regression Metrics

In order to understand regression metrics, it’s best to get hands-on experience with a real dataset. In this tutorial, we will show you how to make a simple linear regression model in scikit-learn and then calculate the metrics that we have previously explained.

Средняя ошибка прогноза (или ошибка прогноза)

Средняя ошибка прогноза рассчитывается как среднее значение ошибки прогноза.

mean_forecast_error = mean(forecast_error)

Ошибки прогноза могут быть положительными и отрицательными. Это означает, что при вычислении среднего из этих значений идеальная средняя ошибка прогноза будет равна нулю.

Среднее значение ошибки прогноза, отличное от нуля, указывает на склонность модели к превышению прогноза (положительная ошибка) или занижению прогноза (отрицательная ошибка). Таким образом, средняя ошибка прогноза также называетсяпрогноз смещения,

Ошибка прогноза может быть рассчитана непосредственно как среднее значение прогноза. В приведенном ниже примере показано, как среднее значение ошибок прогноза может быть рассчитано вручную.

expected = [0.0, 0.5, 0.0, 0.5, 0.0] predictions = [0.2, 0.4, 0.1, 0.6, 0.2] forecast_errors = [expected[i]-predictions[i] for i in range(len(expected))] bias = sum(forecast_errors) * 1.0/len(expected)
print(‘Bias: %f’ % bias)

При выполнении примера выводится средняя ошибка прогноза, также известная как смещение прогноза.

Bias: -0.100000

Единицы смещения прогноза совпадают с единицами прогнозов. Прогнозируемое смещение нуля или очень маленькое число около нуля показывает несмещенную модель.

How Does the MAE Compare to MSE?

The mean absolute error and the mean squared error are two common measures to evaluate the performance of regression problems. There are a number of key differences betwee the two:

  • Unlike the mean squared error (MSE), the MAE calculates the error on the same scale as the data. This means it’s easier to interpret.
  • The MAE doesn’t square the differences and is less susceptible to outliers

Both values are negatively-oriented. This means that, while both range from 0 to infinity, lower values are better.

3. Root Mean Squared Error or RMSE

RMSE is the standard deviation of the errors which occur when a prediction is made on a dataset. This is the same as MSE (Mean Squared Error) but the root of the value is considered while determining the accuracy of the model.

from sklearn.metrics import mean_squared_error
from math import sqrt

actual_values = [3, -0.5, 2, 7] predicted_values = [2.5, 0.0, 2, 8]

mean_squared_error(actual_values, predicted_values)
# taking root of mean squared error
root_mean_squared_error = sqrt(mean_squared_error)

Средняя абсолютная ошибка

средняя абсолютная ошибкаили MAE, рассчитывается как среднее значение ошибок прогноза, где все значения прогноза вынуждены быть положительными.

Заставить ценности быть положительными называется сделать их абсолютными. Это обозначено абсолютной функциейабс ()или математически показано как два символа канала вокруг значения:| Значение |,

mean_absolute_error = mean( abs(forecast_error) )

кудаабс ()делает ценности позитивными,forecast_errorодна или последовательность ошибок прогноза, иимею в виду()рассчитывает среднее значение.

Мы можем использоватьmean_absolute_error ()функция из библиотеки scikit-learn для вычисления средней абсолютной ошибки для списка прогнозов. Пример ниже демонстрирует эту функцию.

from sklearn.metrics import mean_absolute_error
expected = [0.0, 0.5, 0.0, 0.5, 0.0] predictions = [0.2, 0.4, 0.1, 0.6, 0.2] mae = mean_absolute_error(expected, predictions)
print(‘MAE: %f’ % mae)

При выполнении примера вычисляется и выводится средняя абсолютная ошибка для списка из 5 ожидаемых и прогнозируемых значений.

MAE: 0.140000

Эти значения ошибок приведены в исходных единицах прогнозируемых значений. Средняя абсолютная ошибка, равная нулю, означает отсутствие ошибки.

Evaluation of the model

In this step, we will evaluate the model by using the standard metrics available in sklearn.metrics. The quality of our model shows how well its predictions match up against actual values. We will assess how well the model performs against the test data using a set of standard metrics that we have previously introduced.

# model evaluation for testing set

mae = metrics.mean_absolute_error(y_test, y_predicted)
mse = metrics.mean_squared_error(y_test, y_predicted)
r2 = metrics.r2_score(y_test, y_predicted)

print(“The model performance for testing set”)
print(‘MAE is {}’.format(mae))
print(‘MSE is {}’.format(mse))
print(‘R2 score is {}’.format(r2))regression model performance

Mean absolute error is 3.55which shows that our algorithm is not that accurate, but it can still make good predictions. 

The value of the mean squared error is 26.36 which shows that we have some outliers.

The R2  score is 0.66and it shows that our model doesn’t fit data very well because it cannot explain all the variance. 

Considering our regression metrics, we can conclude that the model can be further improved. At this point, we could consider adding more features or trying to fit a different regression model.

Actual vs Predicted graph

Before looking at the metrics and plain numbers, we should first plot our data on the Actual vs Predicted graph for our test dataset. This is one of the most useful plots because it can tell us a lot about the performance of our model. The plot below uses MatPlotLib to make its visualizations for analyzing residuals v. model fit.

fig, ax = plt.subplots()
ax.scatter(y_predicted, y_test, edgecolors=(0, 0, 1))
ax.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], ‘r–‘, lw=3)
ax.set_ylabel(‘Actual’) vs Predicted graph for linear regression

On this plot, we can check out where the points lie. We can see that some points are far away from the diagonal line and we can conclude that the R2  score will be low. This shows us that the model doesn’t fit the data very well. Now, we will make the same conclusion by solely observing magnitudes of regression metrics.

Use Sklearn to Calculate the Mean Absolute Error (MAE)

In this section, you’ll learn how to use sklearn to calculate the mean absolute error. Scikit-learn comes with a function for calculating the mean absolute error, mean_absolute_error. As with many other metrics, with function is in the metrics module.

Let’s see what the function looks like:

# Importing the function
from sklearn.metrics import mean_absolute_error


The function takes two important parameters, the true values and the predicted values.

Now let’s recreate our earlier example with this function:

import numpy as np
from sklearn.metrics import mean_absolute_error

true = [1,2,3,4,5,6] predicted = [1,3,4,4,5,9]

print(mean_absolute_error(true, predicted))

# Returns: 0.833

Mean Absolute Percentage Error with Python scikit learn library

In this example, we have implemented the concept of MAPE using Python sklearn library.

Python sklearn library offers us with mean_absolute_error() function to calculate the MAPE value as shown below–


from sklearn.metrics import mean_absolute_error
Y_actual = [1,2,3,4,5] Y_Predicted = [1,2.5,3,4.1,4.9] mape = mean_absolute_error(Y_actual, Y_Predicted)*100



Средняя квадратическая ошибка

средняя квадратическая ошибкаили MSE, рассчитывается как среднее значение квадратов ошибок прогноза. Возведение в квадрат значений ошибки прогноза заставляет их быть положительными; это также приводит к большему количеству ошибок.

Квадратные ошибки прогноза с очень большими или выбросами возводятся в квадрат, что, в свою очередь, приводит к вытягиванию среднего значения квадратов ошибок прогноза, что приводит к увеличению среднего квадрата ошибки. По сути, оценка дает худшую производительность тем моделям, которые делают большие неверные прогнозы.

mean_squared_error = mean(forecast_error^2)

Мы можем использоватьmean_squared_error ()функция из scikit-learn для вычисления среднеквадратичной ошибки для списка прогнозов. Пример ниже демонстрирует эту функцию.

from sklearn.metrics import mean_squared_error
expected = [0.0, 0.5, 0.0, 0.5, 0.0] predictions = [0.2, 0.4, 0.1, 0.6, 0.2] mse = mean_squared_error(expected, predictions)
print(‘MSE: %f’ % mse)

При выполнении примера вычисляется и выводится среднеквадратическая ошибка для списка ожидаемых и прогнозируемых значений.

MSE: 0.022000

Значения ошибок приведены в квадратах от предсказанных значений. Среднеквадратичная ошибка, равная нулю, указывает на совершенное умение или на отсутствие ошибки.

The coefficient of determination (R2 score)

R2 score determines how well the regression predictions approximate the real data points.

The value of R2 is calculated with the following formula:

coefficient of determination R2 score

where ŷi represents the predicted value of yi and ȳ is the mean of observed data which is calculated as

formula for the mean value

R2 can take values from 0 to 1. A value of 1 indicates that the regression predictions perfectly fit the data.

The rationale behind the model

The dataset that we will use is a Boston Housing Dataset and the task of our model will be to predict the price of the house. Let’s say that we are an estate agent and that we want to quickly determine the price of the house in Boston. We can do this by creating a model that considers the features of the new property, such as the number of rooms, air quality and the crime rate in the town, etc. In order to train the model, we will build a model on the features of training data and we will use the model to predict a value for new data.

How do You Interpret the Mean Absolute Error

Interpreting the MAE can be easier than interpreting the MSE. Say that you have a MAE of 10. This means that, on average, the MAE is 10 away from the predicted value.

In any case, the closer the value of the MAE is to 0, the better. That said, the interpretation of the MAE is completely dependent on the data. In some cases, a MAE of 10 can be incredibly good, while in others it can mean that the model is a complete failure.

The interpretation of the MAE depends on:

  1. The range of the values,
  2. The acceptability of error

For example, in our earlier example of a MAE of 10, if the values ranged from 10,000 to 100,000 a MAE of 10 would be great. However, if the values ranged from 0 through 20, a MAE would be terrible.

The MAE can often be used interpreted a little easier in conjunction with the mean absolute percentage error (MAPE). Calculating these together allows you to see the scope of the error, relative to your data.


Now, let’s start our coding! First, we need to import the necessary libraries:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
from sklearn import datasets
%matplotlib inline

We will use the Boston Housing Dataset which can be accessed from the scikit-learn library.

boston_data = datasets.load_boston()

Our dataset is a dictionary that contains key:value pairs. We can check them out by printing the keys.


boston housing dataset keys

This dataset contains 506 samples and 13 feature variables. The objective is to predict the prices of the house using the given features.


By printing the description of the dataset, we can see more information about it and the features that it contains.


description of boston housing dataset

Our features will be stored in x variable and output values (prices of properties) will be stored in y. Output values that are stored in y are the target values that our model will try to predict.

# Input Data
x =
# Output Data
y =

After that, we need to split our data into train and test splits.

X_train, X_test, y_train, y_test = train_test_split(x,y,test_size=1/3, random_state=0)

For the prediction, we will use the Linear Regression model. This model is available as the part of the sklearn.linear_model module. We will fit the model using the training data.

model = LinearRegression(), y_train)

Once we train our model, we can use it for prediction. We will predict the prices of properties from our test set.

y_predicted = model.predict(X_test)


By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more such posts related to Python, Stay tuned here and till then, Happy Learning!!


Importance of Model Evaluation

Being able to correctly measure the performance of a machine learning model is a critical skill for every machine learning practitioner. In order to assess the performance of the model, we use evaluation metrics.

Depending on the type of problem that we want to solve, we can perform classification (where a categorical variable is predicted) or regression (where a real number is predicted) in order to solve it. Luckily, the scikit-learn library allows us to create regressions easily, without having to deal with the underlying mathematical theory. 

In this article, we will demonstrate how to perform linear regression on a given dataset and evaluate its performance using:

  • Mean absolute error
  • Mean squared error
  • R2 score (the coefficient of determination)

Calculate the Mean Absolute Error in Python

In this section, you’ll learn how to calculate the mean absolute error in Python. In the next section, you’ll learn how to calculate the MAE using sklearn. However, it can be helpful to understand the mechanics of a calculation.

We can define a custom function to calculate the MAE. This is made easier using numpy, which can easily iterate over arrays.

# Creating a custom function for MAE
import numpy as np

def mae(y_true, predictions):
y_true, predictions = np.array(y_true), np.array(predictions)
return np.mean(np.abs(y_true – predictions))

Let’s break down what we did here:

  1. We imported numpy to make use of its array methods
  2. We defined a function mae, that takes two arrays (true valuse and predictions)
  3. We converted the two arrays into Numpy arrays
  4. We calculated the mean of the absolute differences between iterative values in the arrays

Let’s see how we can use this function:

# Calculating the MAE with a custom function
import numpy as np

def mae(y_true, predictions):
y_true, predictions = np.array(y_true), np.array(predictions)
return np.mean(np.abs(y_true – predictions))

true = [1,2,3,4,5,6] predicted = [1,3,4,4,5,9]

print(mae(true, predicted))

# Returns: 0.833

We can see that in the example above, a MAE of 0.833 was returned. This means that, on average, the predicted values will be 0.833 units off.

In the following section, you’ll learn how to use sklearn to calculate the MAE.


Решите Вашу проблему!