All Collections
Evaluating a Model
Dive deeper
How do you know if your model is good?
How do you know if your model is good?

To determine your model's performance, we compare its lift to random guess and benchmark models. You can also run A/B tests and outcomes.

Ori Sagi avatar
Written by Ori Sagi
Updated over a week ago

Pecan’s goal is to help you achieve real business value by using predictive ML models. This value is realized when a Pecan model demonstrates a lift than what would be achieved through traditional BI rules and possibly no model at all.

Hence, to determine if the model's performance is considered good, you should examine it by the lift it provides over a benchmark model, and in binary classification models, over Random guess.

Comparison to Random Guess

Comparing a model's performance to a Random Guess aims to answer the questionWhat is the lift which is provided by the model, compared to a situation where no logic is implemented to detect the targeted population”?

What is a random guess?

In Pecan, the random guess is determined by the rate of the positive target in the data (AKA, the Target rate), which can be used as a reference point to determine whether the model is good.

Suppose the target rate is 10%. In this case, a random guess model that predicts 100 entities as positives would be correct in 10 cases, resulting in a Precision of 10%. If an AI model predicts positive cases at a rate higher than the target rate, it performs better than a random guess model. For instance, a model that achieves a Precision of 20% is twice as effective as a random guess model.

Comparison to rule-based logic

Comparing a model's performance to a Benchmark model aims to answer the questionWhat is the lift which the model provides compared to a simple rule-based model”?

What is a rule-based logic?

In Pecan, a benchmark, which is a rule-based model, is produced for every model, which is built to serve as a reference point.

The benchmark is a simple rule-based method, which is based on a single column from the attribute tables, which had the highest correlation to the Target. The benchmark model is being compared to Pecan’s complex tree-based models to point out the model's predictive power.

If a naive model appears to perform as well as, or even better than, your Pecan model, and there are no underlying issues with your datasets, this may indicate that you don’t need a tree-based machine-learning model for your business needs.

Suppose we built a Churn model, and the attribute which has the strongest correlation with the label is “number_of_days_since_last_activity”. The entities will be automatically split into groups, and the benchmark will assign each group a probability score.
For example:

  • number_of_days_since_last_activity” is lower than 3,
    then the probability of Churn is 0%.

  • number_of_days_since_last_activity” is between 3-10,
    then the probability of Churn is 20%.

  • number_of_days_since_last_activity” is between 10-21,
    then the probability of Churn is 60%.

  • number_of_days_since_last_activity” is higher than 21,
    then the probability of Churn is 95%.


A/B Testing

One way to assess the value of your Pecan model is by conducting an A/B test. Here, you would make predictions for one population based on a Pecan model and make predictions for an equivalent population based on a naive model (as explained above).

You would have two primary ways to compare these models – based on statistical performance and based on business outcomes.

Comparing based on the statistical performance

Say you want to predict the likelihood of customers upgrading to VIP status. You can make predictions for one population using a Pecan model and make predictions for an equivalent population using a naive model.

Once the prediction window – the period for which you made predictions – has passed, you can compare each model’s predictions to the observed results for the corresponding population. This enables you to compare the statistical performance of each model.

Of course, it is essential to choose an appropriate evaluation metric – you can do this based on whether it’s a binary or regression problem and based on your particular business need or goal.

For example: if you want to improve the efficiency of your call center, which is able to handle 1,000 calls per month, you could evaluate the model by “Precision @ 100”. In Pecan, precision refers to a model’s ability to correctly predict instances of target behavior – in this case, among the top 1,000 entities predicted to perform the behavior.

Once you have done a statistical comparison, you can answer the question: which model was able to predict customer behavior or outcomes more accurately?

Comparing based on business outcomes

Now that you know how accurate each model is, you can assess its usefulness in the real world. Say you were to conduct a business treatment by offering customers a trial VIP subscription. Which population would respond better to this? Would you see greater conversion among those predicted to behave a certain way based on the Pecan model or based on the naive model? Do the results align with the assumptions made by those models?

By carrying out and comparing business treatments for each model/population, you may discover that you can generate a higher conversion rate (or upsells, retention, revenue, etc.) by using a Pecan model than by relying on other methods. This quantitative difference is where Pecan’s actual value resides. And when dealing with large numbers of customers and/or dollars, the impact can be massive.


Did this answer your question?