All Collections
Evaluating a Model
Regression Models
Model performance metrics for regression models
Model performance metrics for regression models Written by Ori Sagi
Updated over a week ago

In a regression model, predicted and actual results exist on a continuum. Instead of being classified into mutually exclusive populations (e.g. with a label of “0” or “1”), results are represented as numbers.

Whereas a binary model separates a population into two classes (one of which is of interest and will receive a business treatment), in a regression model, each member of the population is of interest. What may differ between them is the degree of interest.

Therefore, for regression models, you need metrics that take into account each member of the population. Below is a list of metrics that are useful for evaluating the results of a regression model in Pecan:

# Median Absolute Percentage Error (MdAPE)

Median Absolute Percentage Error, a common metric for forecasting error, reflects the prediction accuracy of a regression model. It calculates the absolute percentage error for each prediction, and then determines the median value.

This allows you to make the statement: “This model has a median margin of error of X%.” The metric can be expressed as a percentage, as defined by the formula:

For example: a \$100 prediction vs. a \$110 actual outcome produces a percentage error of 10% (it is “absolute” because it doesn’t consider the direction of the error). The median value can then be calculated for all errors.

In general, a percentage approaching zero indicates a better model. But what is practically considered “a good percentage” will depend on your business benchmarks.

MdAPE is particularly useful when dealing with an unbalanced distribution of results, since using a median makes it less sensitive to outliers. If there is a normal distribution of results, it will have a very similar value to the MAPE (which uses the mean error value).

In Pecan, MdAPE is displayed in the dashboard for regression models (except for LTV models). It can also be used as an optimization metric for regression models.

# Mean Absolute Percentage Error (MAPE)

This metric is nearly identical to MdAPE; the difference is that MAPE calculates the mean absolute percentage error instead of the median. It is calculated as follows:

The downside of using Mean Absolute Percentage Error is that not all errors are equal, and if you have an unbalanced distribution of results, the MAPE may be high even if the errors are not significant overall.

Let’s say you have a user who is predicted to spend 2\$ but spends only \$1 – this gives an error of 100%. Then you have a user who is predicted to spend \$1,000 but actually spends 1,100\$ – an error of only 10%. Even though the MAPE will score better for the second example, it’s clearly not a fair representation of error value. This issue is resolved with Weighted MAPE.

In Pecan, MAPE may be displayed in the dashboard for regression models (except for LTV models). It can also be used as an optimization metric for regression models.

# Weighted MAPE (WMAPE)

Like MdAPE and MAPE, Weighted Mean Absolute Percentage Error expresses prediction accuracy as a measure of error. The difference is that it weighs each prediction error relative to the results of the entire dataset (instead of calculating each error independently.) In other words: the greater the percentage error of a prediction, the greater its effect on the score.

For predictive models in Pecan, Weighted MAPE is typically considered a more reliable metric than the non-weighted variations of absolute percentage error. It is calculated as follows:

For example: a \$1 error would have a greater impact on the score if it occured for a \$10 purchase than for a \$100 purchase. (Compare this to MAPE, where a \$1 error would have the same impact whether the purchase value was \$10 or \$100.)

This approach “levels the playing field”, making it possible to compare model performance for disparate datasets. For example: you can evaluate a model’s ability to predict the outcomes of entirely different marketing campaigns or channels. (However, in some cases, it may be necessary to split your dataset into multiple cohorts based on business logic – like by campaign – find the Weighted MAPE for each of them, and then use their average to score the model.)

In Pecan, Weighted MAPE is displayed in the dashboard for LTV models. It is also a component of WMAPE by WMPE, which is the default optimization metric for regression models.

# Weighted MPE (WMPE)

Like Weighted MAPE, Weighted Mean Percentage Error calculates the mean prediction error of the population and gives greater weight to errors of greater value.

The difference is that it does not use absolute values. Instead, it factors in whether each error is a positive or negative value – thereby expressing direction. A value above zero represents an over-prediction, and a value below zero would represents an under-prediction.

Weighted MPE is calculated as follows:

For example: if a model predicts total revenue of \$100,000 from all users, and the actual revenue turned out to be \$102,000 the Weighted MPE would be -0.02 (an under-predicting model). But if the actual revenue turned out to be \$980,000, the score would be +0.02 (an over-predicting model).

Since Weighted MPE calculates the mean percentage error of all summed predicted values, positive and negative values will cancel each other out. This means the metric will usually be closer to zero than Weighted MAPE, whose absolute values do not cancel each other out.

This makes for an interesting insight: if your Weighted MPE and Weighted MAPE are very similar, it means that most of your errors are biased in one direction. This indicates indicating that your model is either drastically over-predicting or under-predicting.

In Pecan, Weighted MPE is displayed in the dashboard for LTV (Lifetime Value) models. It is also a component of WMAPE by WMPE, which is the default optimization metric for regression models.

# Weighted MAPE by Weighted MPE (WMAPE by WMPE)

Weighted MAPE and Weighted MPE are distinct metrics listed separately in the dashboard for LTV models in Pecan. However, Pecan calculates “Weighted MAPE by Weighted MPE” as an effective and reliable metric for evaluating the performance regression models. Indeed, it is the default optimization metric for regression models.

To learn how it’s calculated and used as an optimization metric, see WMAPE by WMPE.

# Explained Variance (R2)

Unlike the above metrics which measure error, R2 detects the level of correlation between a model’s predictions and the actual results. It indicates the percentage of variance that’s explained by the model (and not by the data itself). The higher the score, the better a predictive model can explain variance based on the features that have been fed into the model.

R2 is calculated by summing the square of the distance between predicted values and actual values, and then dividing that value by the square of the distance between the actual values and the mean. More simply, it calculates the variance explained by the model, and divides that by the total variance. This is shown below:

A score close to 1 (or 100% in percentage terms) means your model has achieved a high correlation between predictions and actual results. A score closer to 0 indicates poor predictive performance due to a poor model and/or large amount of variance in your dataset.

As an example, a score of 0.71 means that 71% of the variability of predictions is accounted for by the model, and the remaining 29% is accounted for by variance in the original data.

Using R2, you can ensure that most of the variability in predictions is attributed to your model – and not variability in your data. This provides a measure of how well results can be replicated by the model, using a given benchmark.

Graphical representation

R2 measures goodness of fit by calculating how far each dot is from the fitted regression line (the basis of the model’s predictions). This distance is known as the “residual”. In the below illustration, actual outcomes are plotted along a linear regression line.

The more scattered the plots, the more variance is accounted for by the values themselves, and the worse the predictive ability of the model (giving a lower R2 score). To put it another way: the further plots are from the regression line (the greater the residuals), the more they punish the model (the greater the impact on the R2 score).

This metric is particularly useful for models that are designed to establish a trend (e.g. attempting to rank customers between 0 and 1). Each prediction may be off by a certain amount, but you would want to ensure that higher predictions correspond with higher actual results.

In Pecan, R2 is displayed in the dashboard for regression models (except for LTV models). It is also the default optimization metric for regression models.

# Root Mean Squared Error (RMSE)

Root Mean Squared Error calculates the average distance between a model’s predicted values and the actual values in the dataset (this distance is know as the “residual” or prediction error). In doing so, it tells you how concentrated the actual data is around the line of best fit (the model’s regression line).

RMSE is the standard deviation of the residuals of a dataset. As shown below, it’s calculated by squaring the distance between the actual values (“y”) and the predicted values (“ŷ”), calculating the mean, and then taking its square root. The lower the score, the better the model is able to “fit” the dataset – indicating better predictive performance.

This metric is similar to Explained Variance (R2), but it gives more weight to larger prediction errors – and penalizes outliers – because it squares the residual (the prediction error). As a result, it also produces predictions that are closer to the mean than to the median.

Therefore, you would use this metric if you wish to penalize large errors and have predictions that are more on the “conservative” side.

In Pecan, RMSE may be displayed in the dashboard for a regression model, and it may be used as an optimization metric for regression models.

# Root Mean Squared Logarithmic Error (RMSLE)

Like RMSE, this metric measures how spread out the residuals are in a predictive model (where a “residual” is the distance between an actual result and the model’s regression line). This tells you how concentrated the actual data is around the line of best fit. The lower the score, the better the model.

The difference is that Root Mean Squared Logarithmic Error doesn’t immediately square the residual before calculating the mean value. Instead, it does a log transformation on the residual, calculates the mean, and then takes the square root. It is calculated as follows:

Whereas RMSE emphasizes the impact of outliers, RMSLE reduces the impact of them. This is appropriate when the magnitude of outliers is not meaningful – that is, when the magnitude doesn't significantly affect the usefulness of the model. Accordingly, it’s appropriate for business use-cases where over-predicting is more acceptable than under-predicting.

In Pecan, RMSLE may be displayed in the dashboard for a regression model, and it may be used as an optimization metric for regression models.

# Root Mean Squared Percentage Error (RMSPE)

Like RMSE, this metric measures the distance between a model’s predicted values and the actual values in the dataset (a.k.a. the “residual” or prediction error) – thus indicating how concentrated the actual data is around the line of best fit. The lower the score, the better the predictive performance of the model.

The difference is that Root Mean Squared Percentage Error “scales down” the magnitude of the final value. Once the residual percentage error is calculated and squared, it’s divided by the square of the observed values. Then it’s divided by the number of samples before the square root is taken. It is calculated as follows:

The approach reduces the impact of prediction error on your model. It is appropriate for business use-cases where the magnitude of either over-predicting or under-predicting doesn’t significantly affect the usefulness of the model.

In Pecan, RMSLE may be displayed in the dashboard for a regression model, and it may be used as an optimization metric for regression models.