All Collections
Evaluating a Model
Regression Models
Dashboard overview for regression models
Dashboard overview for regression models

Explore your Pecan AI model's insights with a dashboard: track performance, compare predictions, analyze feature importance, and more!

Linor Ben-El avatar
Written by Linor Ben-El
Updated over a week ago

Once your Pecan model is fully trained, you can view its performance in an interactive dashboard. Pecan’s dashboard provides statistical information and tools so you can understand the accuracy of your model, tailor it to your business needs, and understand the importance of different features in your data.

This article provides an overview of navigating and interpreting the metrics in a dashboard for a regression model.

To view the dashboard for any model you have created:

  1. Click the “Predictive Flows” tab at the top of the screen

  2. Click on a flow with a trained model

Examining Predictions on the Test Set

It's important to remember that the metrics displayed in your dashboard will be for your Test Set, which is the final 10% of training data that Pecan automatically sets aside during the training process. This set serves as fresh data that your model will test its predictions against to evaluate the predictive performance of your model.

Below is a breakdown of each dashboard component for a regression model, listed in order of appearance.

Model Evaluation Tab

Performance metrics

The regression dashboard highlights metrics that represent the performance of the model. It appears in this fashion:

For an explanation of each of these metrics, please take a look at the Model performance metrics for regression models article.

Predicted vs. actual vs. benchmark

This graph illustrates a comparative analysis between the model's predicted values, the actual dataset values, and the estimates of a benchmark model.

The X-axis of this graph is determined by the actual values in the dataset, divided into 100 percentiles. This arrangement allows for the representation of the model's performance across high, medium, and low percentiles of the data.

The graph offers two display modes: 'zeros included' and 'zeros excluded'. Excluding zeros can be particularly advantageous when dealing with models where a large proportion of the data consists of zeros (such as LTV). This facilitates the exploration of model performance over non-zero observations.

Accuracy among different groups

Pecan allows you to test the model's performance across different ranges of values in the data. The entities are split into 3 groups, based on their actual label: low, med, and high. In each group, the following details are provided:

  • Values: range of the actual values of entities in this group.

  • Group total value: the sum of all entities in this group.

  • Group size: the proportion of this group from the data.

  • Mean error: The average error between model predictions and actual values.

  • Benchmark error: The error of a simple rule-based model.

The groups' limits are configurable and can be edited according to your wish.

Predicted and actual vs benchmark - Over Time

This graph shows how your model performed over a period of time by comparing your predictions against actual values and against benchmark predictions.

This graph helps us see trends, patterns that repeat over time, noise in the data, and unusual points that need to be looked at more closely.

By pressing the 'Sum' button, the graph shows the total predicted and actual values for all the things we're looking at. For instance, in a model predicting how much money will be spent over time, it compares the total predicted spending with the total real spending for each day.

The 'Average' button shows the average predicted and actual values for all the things we're looking at. In the same spending model, it compares the average predicted spending with the average real spending per item for each day.

You can also change the graph to show data by day, week, or month.

Column Importance

When your model is trained, it uses the columns from the Attribute tables to find common patterns and similarities of the Target population. The model assigns different weights to the columns according to the impact they had on predicting the Target.

The importance of each column is calculated by summing the importance of all the AI aggregations (also known as features) that were extracted from the column.

For a comprehensive explanation of the widget and how to interpret it, see Understanding Column importance.

Model Output Tab

Located at a different tab, this table displays a sample of 1,000 predictions in your dataset, including:

  • EntityID & Marker

  • Actual value

  • Predicted value

  • Error (predicted value - actual value)

  • 10 most contributing features to the prediction (when clicking a row).

You can download the full output table to a spreadsheet by clicking Save as CSV.

For more details, see this article: Understanding Explainability & Prediction Details.

Training Overview Tab

This tab allows you to dive deep into the model's logic and see how each value in your data affects the probability score.

Entities Overtime

Allows you to easily spot possible issues with the number of samples used in your model over time - overall quantity, trends and patterns, gaps or drops.

For example, see that you didn't miss any months by mistake, or ensure you've included the entire period you wanted to include.
The graph also shows the split between the train and test sets, allowing you to see the amount of samples in each set.

Entities Overtime

Allows you to easily spot possible issues with the number of samples used in your model over time - overall quantity, trends and patterns, gaps or drops.

For example, see that you didn't miss any months by mistake, or ensure you've included the entire period you wanted to include.
The graph also shows the split between the train and test sets, allowing you to see the amount of samples in each set.

Top Features

Clicking on each feature will load a Feature Affect Graph (a.k.a. Partial Dependency Plot or PDP) on the right side of the widget, displaying a graph based on the SHAP values. This graph shows the effect of each feature and its values on your model’s predictions.

☝️ Remember:
ML models are VERY complex, and you cannot attribute an impact to a specific feature or a value, as they work together with numerous other features to get to the final probability score.

A PDP graph for a continuous variable
A PDP graph for a categorical variable

The graph shows the top 10 categories or a value histogram, their average impact, and their maximum and minimum impact on the probability score.

Attribute Tables

Get an insightful, tabular view of the analysis conducted on your data attributes, providing you with a deeper understanding of how your data is structured and utilized in model training.

Before diving into the details, it's crucial to remember that the analysis presented in this widget is based on your train dataset, which is about 80% of your entire dataset. This means the figures might appear smaller than anticipated, as they don't represent the full dataset.

The widget provides a comprehensive overview of each table used in your model’s training. Here's what you can discover at a glance:

  • Row and Column Count: Understand the size and complexity of your table with the total number of rows and columns.

  • Column Types: Get insights into the composition of your table with a count of date, category, and numeric columns.

  • Dropped Columns: See how many columns are not utilized in model training, including the count and the reasoning behind their exclusion.

  • Entity Row Distribution: Discover the range of rows per entity, revealing the relationship type (1:1 or 1:many) within your data, in the structure of [min]-[max].

For an in-depth understanding, you can expand each table to view specific details about its columns:

  • Column Name: The actual name of the column as it appears in your schema.

  • Original Type: The data type assigned to the column in your DWH, providing a glimpse into its original format.

  • Pecan Transformation: How Pecan interprets and utilizes each column for its feature engineering process. If a column is marked as "dropped," you’ll also see why it wasn’t used for training the model.

  • Unique Values: The count of distinct values within a column, reflecting its diversity.

  • Missing Values: The number of NULL or missing entries, crucial for understanding data completeness.

Did this answer your question?