Once your Pecan model is fully trained, you can view its performance in an interactive dashboard.
Pecan's dashboard provides statistical information and tools to help you understand how accurate your model is, tailor it to your business needs, monitor predictions over time, and discover the importance of different columns in your data.
It is important to remember that the metrics displayed in your dashboard are for your Test Set, which is the latest 10% of the training data that Pecan automatically sets aside during the training process. Pecan uses this data as "fresh data", asking your model to create predictions for it and then testing the predictions against it to evaluate the predictive performance of your model.
Below is a breakdown of each dashboard component for a binary model, listed in order of appearance.
If you want to get more information you can always click the Explain
button on your dashboard to get tailored explanations about your model's results and ask any questions you might have:
Model Evaluation Tab
Head Metric & Quality comparison - How Good Is Your Model?
By default, the first widget displays the Precision of the model and compares to a random guess (baseline).
Imagine you have a box filled with white and purple balls where 10% of the balls are purple. If you were to reach into the box and pull out balls without looking, statistically, about 10% of the balls you pick would be purple. This scenario represents a random guess, which relies solely on the general frequency of red balls in the box, without any strategic selection.
Now, contrast this with your model. Rather than randomly drawing balls, the model uses patterns and information from data to predict which balls are purple. It’s like having a strategy that increases your chances of picking purple balls far above the random guess rate of 10%.
This widget visually compares the model’s precision against the success rate you’d expect from random guesses. This comparison not only showcases how much more effective the model is compared to a random guess but also highlights the sophisticated logic the model uses to identify the correct outcomes.
Customize model evaluation for your needs
Sharing the way you plan to use your predictions may change the main head metric we show you to best match your use case.
All the other metrics are always available in the widgets below.
"Explore Your Model" Widgets
Dive deeper into your model's performance
Metrics Analysis
This widget presents both the Precision and Recall metrics of the model and offers insights into the number of entities utilized in the test set to evaluate the model. Additionally, you can modify the head metric to be either Precision or Recall within this section.
Threshold configuration
The threshold is a parameter in machine learning that determines the cutoff point for classifying predicted outcomes as positive or negative. It can be adjusted after training the model to meet specific business needs. Pecan allows you to adjust the threshold for optimal model performance.
By changing the threshold, the proportions of predicted positive and negative outcomes can be altered, impacting overall model performance.
It's important to note that adjusting the threshold doesn't change the model itself but instead changes the probability score used to classify predictions into two classes. Pecan sets a default threshold based on the optimal precision and recall balance to provide the best results.
The graph illustrates the distribution of probability scores for entities in negative and positive classes. A clear separation between the classes indicates that the model can effectively distinguish between them and assign low probability scores to the negative class and high probability scores to the positive class.
Confusion Matrix
A confusion matrix is like a report card for our model. It tells us how well our model did in predicting the two classes. The confusion matrix has two main sections: Predicted as Negative and Predicted as Positive. The threshold defines the proportions between the two.
Under the two sections, there are four elements, which are determined by the scores provided to the entities by the model:
True '1': This is when the model correctly predicts that something is positive.
False '1': This is when the model incorrectly predicts that something is positive.
True '0': This is when the model correctly predicts that something is negative.
False '0': This is when the model incorrectly predicts that something is negative.
The confusion matrix shows us how many times our model made each of these types of predictions. By looking at the confusion matrix, we can see how well our model is doing and whether it is making more mistakes with false positives or false negatives.
Benchmark Comparison
The benchmark is a simple rule-based model created using a single column from the data that strongly correlates to the Target column. It serves as a reference point to ensure that the model has sufficient predictive power and can outperform the benchmark.
The health check verifies that the model is indeed preforming better than the benchmark.
Test predictions over time
The “Test Set Results Over Time” widget is a powerful tool for monitoring the consistency of your model’s performance.
This graph compares the average predicted probability of an event (like a purchase) with the actual occurrence rate over time. By visually tracking these trends side by side, you can detect any signs of model drift that might not be evident in standard metrics.
If the predicted probabilities start to diverge from the actual outcomes, it could indicate that the model is struggling to maintain accuracy, signaling the need for closer inspection or potential retraining. Keeping an eye on this widget helps ensure that your model remains reliable and accurate over time, even as conditions change.
Performance Consistency (overfit)
The “Performance Consistency” widget also highlights the AUPRC (Area Under the Precision-Recall Curve) values for both the training and test datasets. AUPRC is a crucial metric in machine learning, especially when dealing with datasets where the classes are imbalanced (one class is much more frequent than others).
Understanding AUPRC:
Precision measures the accuracy of the positive predictions. It answers the question: “Out of all the instances predicted as positive, how many are actually positive?”
Recall measures the ability of a model to find all the relevant cases (all actual positives). It answers the question: “Out of all actual positives, how many were identified correctly?”
The AUPRC, therefore, provides a single measure of overall performance of a model across all classification thresholds, combining these two aspects. A higher AUPRC value indicates a model that can discriminate positive cases more effectively, which is particularly important in scenarios like fraud detection or disease screening where missing a positive case can be costly.
Why AUPRC in this Widget is Important:
Comparative Insight: Seeing the AUPRC for both training and test sets side by side helps you understand not just how well the model predicts, but also how precise and reliable those predictions are across different data samples.
Model Reliability: A model with a high AUPRC on both training and test datasets is generally considered robust and reliable, as it consistently identifies the positive class accurately and with high confidence.
This widget, by showing AUPRC values, helps to quickly assess the precision and recall balance of their model, ensuring they have a comprehensive view of its performance and reliability.
Attribute Columns & Features Importance
When your model is trained, it uses the columns from the Attribute tables to find common patterns and similarities of the Target population. The model assigns different weights to the columns according to the impact they had on predicting the Target.
The importance of each column is calculated by summing the importance of all the AI aggregations (also known as features) that were extracted from the column.
For a comprehensive explanation of the widget and how to interpret it, see Understanding Column importance.
Columns & Features Values Effect
Clicking on each feature will load a Feature Affect Graph (a.k.a. Partial Dependency Plot or PDP) on the right side of the widget, displaying a graph based on the SHAP values. This graph shows the effect of each feature and its values on your model’s predictions.
☝️ Remember:
ML models are VERY complex, and you cannot attribute an impact to a specific feature or a value, as they work together with numerous other features to get to the final probability score.
The graph shows the top 10 categories or a value histogram, their average impact, and their maximum and minimum impact on the probability score.
Core set statistics
The “Core Set Statistics” widget on our dashboard provides essential insights into the dataset used for training your model. Here’s why this information is crucial:
Volume of Core Set Samples: This section shows the total number of samples (or records) your model was trained on. A larger number of samples generally means more data for the model to learn from, which can lead to more accurate predictions. That number should align with the numbers you know from the real world - otherwise it means that something might be off with your core_set query.
Train & Validation vs. Test Set Distribution: displays how the samples are split between training/validation and testing. This split is important because the model learns from the training set and is then evaluated on the test set to ensure it can perform well on new, unseen data. If there is a big difference in the distribution it can result in an inaccurate model or in an inaccurate dashboard.
Core set over time
Get insights into the stability and behavior of the label column in your prediction model over time to help you ensure it aligns with the data you're already familiar with.
For example, if your conversion rate is usually around 27%, see that the graph shows a similar number throughout time and identify times when it's higher or lower.
The graph also shows the split between the train and test sets, allowing you to ensure that the trends stay consistent between the two sets.
Attribute table statistics
Get an insightful, tabular view of the analysis conducted on your data attributes, providing you with a deeper understanding of how your data is structured and utilized in model training.
Before diving into the details, it's crucial to remember that the analysis presented in this widget is based on your train dataset, which is about 80% of your entire dataset. This means the figures might appear smaller than anticipated, as they don't represent the full dataset.
The widget provides a comprehensive overview of each table used in your model’s training. Here's what you can discover at a glance:
Row and Column Count: Understand the size and complexity of your table with the total number of rows and columns.
Column Types: Get insights into the composition of your table with a count of date, category, and numeric columns.
Dropped Columns: See how many columns are not utilized in model training, including the count and the reasoning behind their exclusion.
Entity Row Distribution: Discover the range of rows per entity, revealing the relationship type (1:1 or 1:many) within your data, in the structure of [min]-[max].
For an in-depth understanding, you can expand each table to view specific details about its columns:
Column Name: The actual name of the column as it appears in your schema.
Original Type: The data type assigned to the column in your DWH, providing a glimpse into its original format.
Pecan Transformation: How Pecan interprets and utilizes each column for its feature engineering process. If a column is marked as "dropped," you’ll also see why it wasn’t used for training the model.
Unique Values: The count of distinct values within a column, reflecting its diversity.
Missing Values: The number of NULL or missing entries, crucial for understanding data completeness.
Example Output Tab
This tab displays a sample of 1,000 predictions in your dataset, including:
EntityID & Marker
Actual value
Predicted value
Error (predicted value - actual value)
10 most contributing features to the specific prediction (when clicking one of the rows).
You can download the full output table to a spreadsheet by clicking Save as CSV.
For more details, see this article: Understanding Explainability & Prediction Details.
Exporting Your Dashboard
By using the Export button on the top right corner you can do the following:
Send a link to this dashboard to one of your teammates (they have to have their own Pecan user in your workspace to be able to watch it).
Export a model summary in a PDF format to share with whoever you please:
Download the full test set to as a CSV file so you can run your own tests and calculations if you'd like.