All Collections
Evaluating a Model
Understanding Pecan’s Benchmarks
Understanding Pecan’s Benchmarks
Benchmarks evaluate ML models by comparing them to rule-based models, to understand their performance and communicate value to stakeholders
Ori Sagi avatar
Written by Ori Sagi
Updated over a week ago

In the world of AI, benchmarks play a crucial role in measuring the performance of various models. Benchmarks are used to evaluate the effectiveness of algorithms, compare different models, and measure progress.

Benchmarks in Pecan

Traditionally, benchmarks were created using rule-based algorithms that relied on a set of predefined rules and conditions.

In Pecan, a benchmark model is created for each AI model, which aims to be a simple rule-based model for the user to see the lift from the model. The benchmark model is built on the column with the highest correlation to the label.

This approach has three benefits:

  1. It provides a simple reference point for comparison.
    By comparing the performance of an AI model against a rule-based benchmark, we can estimate the extent of the lift provided by the AI model.

  2. It helps to identify areas where the AI model is underperforming.
    If the AI model is not able to outperform the rule-based benchmark, it may indicate that there are limitations to the model or that more data is needed to improve its performance.

  3. It provides a way to communicate the value of AI to non-technical stakeholders.
    By presenting the lift provided by the AI model compared to the rule-based benchmark, we can help stakeholders understand the potential impact of AI on their business.

How is the benchmark calculated in Pecan?

To further elaborate on the process of creating a benchmark in Pecan, let's dive into the steps involved.

Step 1: Choosing the variable

The first step in creating a rule-based benchmark is selecting the variable that will be used to build the rule-based model. This variable is chosen based on its correlation to the Target.

Why is correlation important?
Correlation measures the strength of the relationship between two variables. By selecting the variable with the highest correlation to the label, we are choosing the variable that has the strongest relationship with the outcome we are trying to predict.
This increases the chances that the benchmark will be a useful predictor of the label.

Step 2: Generating a rule based on groups of the chosen variable

Once the variable has been chosen, the next step is to generate a rule based on groups of the chosen variable. This involves dividing the chosen variable into groups and assigning a value to each group based on its relationship to the label.

For example, if the chosen variable is age, we may divide age into groups such as 0-18, 19-30, 31-50, and 51 and above. We would then assign a value to each group based on its relationship to the label. If the label is binary (e.g. 0 or 1), we may assign a value of 0 to groups with a low incidence of the label and a value of 1 to groups with a high incidence of the label.

Once the groups have been assigned values, we can use them to generate a rule-based model. This model will assign a predicted value to each observation based on the group to which it belongs. For example, if an observation belongs to the 31-50 age group, the model will predict a value based on the value assigned to that group.


Conclusion

The process of creating a benchmark in Pecan involves selecting the variable with the highest correlation to the label and generating a rule-based model based on groups of that variable. The benchmark model provides a simple reference point for comparison and can be used to evaluate the performance of AI models. By using benchmarks, we can measure the effectiveness of AI models and identify areas where they can be improved.

Did this answer your question?