Skip to main content
All CollectionsEvaluating a ModelRegression Models
Outliers Alert in Dashboards (Regression Models)
Outliers Alert in Dashboards (Regression Models)

Spot outliers in regression models, clip them, and improve predictions with Pecan’s Outliers Alert

Ori Sagi avatar
Written by Ori Sagi
Updated over 3 weeks ago

Overview

Outliers are extreme data points that differ significantly from most other values in a dataset. These unusual values can sometimes arise from rare events or anomalies. In regression models, outliers in the training set may harm the training process by skewing the model’s focus toward unpredictable patterns.

To help identify and address outliers, Pecan now provides an Outliers Alert in the model dashboard for production-quality regression models.

How It Works

1. Detecting Outliers

Pecan analyzes the Label (the target variable) of your regression models. The system identifies label values that fall outside a normal distribution (i.e., considerably higher or lower than the majority of values).

2. Healthy Range

For each regression model, Pecan calculates a ‘healthy range’ with minimum and maximum acceptable label values. A Health Check is triggered if any outliers are found in the training set (including both training and validation subsets).

3. Dashboard Alert

When outliers are detected, an alert will appear in your model’s dashboard. You can review the details and decide whether to adjust your data to mitigate these extreme values.

What to Do If Your Model Has Outliers?

You can duplicate your model and send it to train again while using the remove (clip) outliers setting during the training process:

1. Duplicate your Predictive Notebook

Go to the Prepare Data tab at the top of the dashboard, and click Duplicate. Then click Train model, run validations, and click Continue to model training.

2. Change the Training Configuration

Set the Training mode to Production Grade:

3. Enable Outlier Removal

Select the option to remove outliers:
Values above the healthy range will be clipped to the healthy maximum.
Values below the healthy range will be clipped to the healthy minimum.

4. Send Your Model To Train

Click Train model your model again with the outlier removal setting enabled.


Important Notes

Production Quality Models Only

Outlier alerts and removal options are available for production-quality regression models.
Production Grade training takes a bit longer than fast training, as Pecan will make sure to remove all the outliers, and also engineer more complex features. Learn more here.

Outliers in the Test Set

Typically, outliers in the test set are less of a concern because you want your model trained on “clean” data. Removing outliers may improve model performance by reducing skew in the training data. However, you should evaluate whether the outliers themselves are critical data points that need special attention.

If a large number of outliers appear in the test set, it may indicate a mismatch between training and test data distributions and could raise questions about the fairness or reliability of your evaluation.


Need More Help?

If you have questions about how outliers are detected, how to configure outlier removal, or how to interpret alerts in your dashboard, feel free to reach out to our Support Team. We’re here to help you get the best possible results from your regression models!

Did this answer your question?