Once your Pecan model is fully trained, you can view its performance in an interactive dashboard.
Pecan's dashboard provides statistical information and tools to help you understand how accurate your model is, tailor it to your business needs, monitor predictions over time, and discover the importance of different columns in your data.
It is important to remember that the metrics displayed in your dashboard are for your Test Set, which is the latest 10% of the training data that Pecan automatically sets aside during the training process. Pecan uses this data as "fresh data", asking your model to create predictions for it and then testing the predictions against it to evaluate the predictive performance of your model.
Below is a breakdown of each dashboard component for a regression model, listed in order of appearance.
If you want to get more information you can always click the Explain
button on your dashboard to get tailored explanations about your model's results and ask any questions you might have:
Model Evaluation Tab
Head Metric & Quality Comparison - How Good Is Your Model?
The “Error Bias” widget on our new dashboard for regression models is designed to give you a clear, immediate understanding of how close the model’s predictions are to reality. This widget simplifies assessing the model’s accuracy by displaying the average percentage difference between the model’s predictions and the actual observed values.
Why is this Important?
Imagine you’re using a new weather app to find out if you need an umbrella for the day. If the app consistently predicts less rain than what actually happens, you might end up getting soaked! Similarly, the “Error Bias” widget shows you if the model tends to overestimate or underestimate the outcomes, by how much, and helps you gauge the reliability of its predictions.
What Does the Widget Show?
Error Bias Percentage: This number tells you, on average, how much the model’s predictions differ from the actual results. In the provided dashboard snapshot, the predictions are, on average, 14.5% lower than the actual values.
Comparison Bars: These bars give a visual comparison of the total actual values versus the total predicted values over a specified period, helping you see at a glance whether the model tends to predict more or less than the actual figures.
Trend Lines: The graph shows the daily actual values versus the predicted ones, helping you track how this relationship changes over time. It’s a great way to visually inspect whether certain times or conditions affect the accuracy of the predictions.
This widget is a powerful tool for quickly understanding the model’s practical performance and ensuring that your decisions based on these predictions are informed and reliable.
Customize model evaluation for your needs
Sharing the way you plan to use your predictions may change the main head metric we show you to best match your use case.
All the other metrics are always available in the widgets below.
"Explore Your Model" Widgets
Dive deeper into your model's performance
Metrics Analysis
This widget provides a nuanced view of your model’s performance through key statistical metrics. Let’s break down what these metrics mean and why they are essential, even if you’re not a machine learning expert.
Error Bias and Weighted MPE (Mean Percentage Error):
This figure shows the average percentage by which the model’s predictions differ from actual outcomes. A negative value, such as -14.5%, indicates that the model generally predicts values that are lower than the actual figures. This metric helps you understand the tendency of your model to overestimate or underestimate the data.
This adaptation of the mean percentage error gives more importance to larger errors. This weighting means that not all errors are considered equal; larger deviations from the actual values have a greater impact on the metric, emphasizing significant inaccuracies more than smaller ones.
Mean Absolute Error (MAE):
This metric shows the average extent of the errors in predictions, expressed as a percentage. It measures the average magnitude of the errors in a set of predictions, without considering their direction (over or under). It’s a straightforward measure of prediction accuracy where lower values are better, indicating that the predictions are closer to the actual results.
Why These Metrics Matter?
Understanding these metrics is crucial because they provide a clear picture of how well the model performs in practical scenarios. Whether you’re using the model to forecast sales, predict market trends, or manage inventory, these metrics:
Offer insights into the reliability and accuracy of the model.
Help identify if the model has a consistent bias (tending to predict too high or too low).
Enable you to gauge the overall impact of prediction errors on your operational decisions.
By keeping an eye on these metrics, you can make informed decisions about how to use the model’s outputs or when it might be necessary to retrain the model to align better with actual outcomes.
Model vs Benchmark by range
This visualization helps you understand the model’s strengths and weaknesses in various segments, enabling more targeted improvements.
What Does This Widget Show?
Graphical Representation: The graph displays the average values predicted by your model and the benchmark across a range of data segments, from low to high values.
Actual vs. Predicted: It compares actual values to those predicted by both your model and the benchmark, giving you a clear visual of where each method performs best.
Segment Analysis: Below the graph, the data is broken down into segments (low, mid, high values), showing total values, group sizes, and mean errors for each. This breakdown helps pinpoint exactly where your model outperforms or underperforms the benchmark.
Why is this Important?
Precision in Performance Evaluation: By comparing your model’s predictions against a benchmark across different value ranges, you can specifically see where adjustments might be needed—for example, in handling very high or very low values.
Identifying Trends: This widget can reveal trends such as a model’s tendency to underestimate or overestimate in certain ranges, guiding you to refine model training or feature engineering.
Enhanced Strategic Decisions: Understanding these dynamics empowers you to make informed decisions about deploying the model in real-world scenarios, ensuring it is utilized where it is most effective.
The ability to segment performance by different ranges of values is crucial, especially in fields like finance, sales forecasting, or inventory management, where response to scale can significantly impact outcomes. This widget provides a detailed, easy-to-understand overview of how well your model stacks up against traditional rule-based methods, highlighting both its utility and areas for improvement.
Performance Consistency (overfit)
The “Performance Consistency (overfit)” widget is a critical tool for evaluating the robustness of your model across different datasets—specifically, how it performs on training data versus unseen test data.
This widget uses the R² (R-squared) value, a standard measure of model accuracy that quantifies the percentage of the target variable variation that is explained by a linear model.
Understanding the Widget:
Train & Validation set R²: This shows the R² score on the training dataset, which the model has seen during the learning phase. A high R² value here, like 99.2%, indicates that the model explains nearly all the variability of the response data around its mean.
Test set R²: This score reflects how well the model predicts new, unseen data. An R² of 94.8% in the test set suggests that the model remains highly effective even on data it hasn’t previously encountered.
Why is this Important?
Consistency: The relatively close R² values between the training and testing sets indicate that the model generalizes well. This is essential because a model that only performs well on its training data but poorly on new data (a common issue known as overfitting) is less useful in practical applications.
Reliability: By maintaining high R² values across both sets, the model shows it can reliably predict outcomes across varied datasets, not just the data it was trained on.
Model Health: Consistent R² scores are a good sign of a healthy model. If the test score were significantly lower than the training score, it might suggest the model is overfitting and too tailored to the training data, missing broader trends that apply to the general population or different contexts.
Attribute Columns & Features Importance
When your model is trained, it uses the columns from the Attribute tables to find common patterns and similarities of the Target population. The model assigns different weights to the columns according to the impact they had on predicting the Target.
The importance of each column is calculated by summing the importance of all the AI aggregations (also known as features) that were extracted from the column.
For a comprehensive explanation of the widget and how to interpret it, see Understanding Column importance.
Columns & Features Values Effect
Clicking on each feature will load a Feature Affect Graph (a.k.a. Partial Dependency Plot or PDP) on the right side of the widget, displaying a graph based on the SHAP values. This graph shows the effect of each feature and its values on your model’s predictions.
☝️ Remember:
ML models are VERY complex, and you cannot attribute an impact to a specific feature or a value, as they work together with numerous other features to get to the final probability score.
The graph shows the top 10 categories or a value histogram, their average impact, and their maximum and minimum impact on the probability score.
Core set statistics
The “Core Set Statistics” widget on our dashboard provides essential insights into the dataset used for training your model. Here’s why this information is crucial:
Volume of Core Set Samples: This section shows the total number of samples (or records) your model was trained on. A larger number of samples generally means more data for the model to learn from, which can lead to more accurate predictions. That number should align with the numbers you know from the real world - otherwise it means that something might be off with your core_set query.
Train & Validation vs. Test Set Distribution: displays how the samples are split between training/validation and testing. This split is important because the model learns from the training set and is then evaluated on the test set to ensure it can perform well on new, unseen data. If there is a big difference in the distribution it can result in an inaccurate model or in an inaccurate dashboard.
Core set over time
Get insights into the stability and behavior of the label column in your prediction model over time to help you ensure it aligns with the data you're already familiar with.
For example, if your conversion rate is usually around 27%, see that the graph shows a similar number throughout time and identify times when it's higher or lower.
The graph also shows the split between the train and test sets, allowing you to ensure that the trends stay consistent between the two sets.
Attribute table statistics
Get an insightful, tabular view of the analysis conducted on your data attributes, providing you with a deeper understanding of how your data is structured and utilized in model training.
Before diving into the details, it's crucial to remember that the analysis presented in this widget is based on your train dataset, which is about 80% of your entire dataset. This means the figures might appear smaller than anticipated, as they don't represent the full dataset.
The widget provides a comprehensive overview of each table used in your model’s training. Here's what you can discover at a glance:
Row and Column Count: Understand the size and complexity of your table with the total number of rows and columns.
Column Types: Get insights into the composition of your table with a count of date, category, and numeric columns.
Dropped Columns: See how many columns are not utilized in model training, including the count and the reasoning behind their exclusion.
Entity Row Distribution: Discover the range of rows per entity, revealing the relationship type (1:1 or 1:many) within your data, in the structure of [min]-[max].
For an in-depth understanding, you can expand each table to view specific details about its columns:
Column Name: The actual name of the column as it appears in your schema.
Original Type: The data type assigned to the column in your DWH, providing a glimpse into its original format.
Pecan Transformation: How Pecan interprets and utilizes each column for its feature engineering process. If a column is marked as "dropped," you’ll also see why it wasn’t used for training the model.
Unique Values: The count of distinct values within a column, reflecting its diversity.
Missing Values: The number of NULL or missing entries, crucial for understanding data completeness.
Example Output Tab
This tab displays a sample of 1,000 predictions in your dataset, including:
EntityID & Marker
Actual value
Predicted value
Error (predicted value - actual value)
10 most contributing features to the specific prediction (when clicking one of the rows).
You can download the full output table to a spreadsheet by clicking Save as CSV.
For more details, see this article: Understanding Explainability & Prediction Details.
Exporting Your Dashboard
By using the Export button on the top right corner you can do the following:
Send a link to this dashboard to one of your teammates (they have to have their own Pecan user in your workspace to be able to watch it).
Export a model summary in a PDF format to share with whoever you please:
Download the full test set to as a CSV file so you can run your own tests and calculations if you'd like.