Skip to main content
All CollectionsEvaluating a ModelDive deeper
Understanding Column importance
Understanding Column importance

A peak into the "black box" of a model: Key to unlocking model insights and optimizing predictions by weighting features' impact on outcomes

Linor Ben-El avatar
Written by Linor Ben-El
Updated over a year ago

Machine learning models can be used to analyze and extract insights from large and complex data sets. When your model is trained, it uses the columns from the Attribute tables to find common patterns and similarities of the Target population. The model assigns different weights to the columns according to the impact they had on predicting the Target.

What is Column Importance?

Column importance is a measure of how much a given column contributes to the predictive power of the model. The importance of each column is calculated by summing the importance of all the AI aggregations (also known as features) that were extracted from the column.

What are AI aggregations (features)?

Pecan automatically generates AI aggregations (features) based on the Attribute columns that were added to your model. This happens after the Entity, Target, and Attributes queries are defined and before model training begins. The raw data from the Attribute table is summarized and aggregated into new columns, thus creating new features and introducing new “points of view” for the model. The act of creating features on the raw data is called “feature engineering”, and it’s a crucial part of preparing the raw data to be AI-ready.

Feature engineering is typically done by data scientists with the help of data engineers but is done automatically with Pecan. Thanks to vast experience with multiple business use cases, Pecan is able to do this in a robust manner, constantly enriching and improving the best practices.

Example

In this example, we can see that the raw data included two columns of this entity: event_date, and spend. Pecan’s feature engineering process extracted new AI aggregations (features) from the raw data:

  • max spend

  • min spend

  • average spend

  • event count

  • average distance between events (date_diff)

The new data structure that was created by the feature engineering process, in which all the information of the entity is summed to a single raw, is AI-ready.

The importance of each of the AI aggregations (features) is calculated by calculating the SHAP value of all features (a number that will increase as their contribution increases), and normalizing the result so the sum of all feature scores equals 100%.

How column importance can be used?

Understanding the model

Understanding which factors contribute to a model’s predictions – and to what degree – will enable you to refine your dataset and optimize the model itself. For example, you may wish to reduce “noise” in the model by reducing data points that have no meaningful impact or to enhance prediction accuracy by adding potentially relevant data to the model.

Finding interesting patterns of the Target

Understanding which factors can be used to predict reality will reveal opportunities to influence that reality by encouraging particular outcomes or behavior. For example, you may decide to send an email to customers who are predicted to churn, and this may even be informed by the fact that your model detects “date_since_last_email” as being a meaningful factor in predicting churn.

Of course, due to the complex nature of machine learning, it’s generally impossible to achieve complete clarity on how a model works. Each model is a complex statistical structure that holds a lot of relationships between different variables, which cannot always be interpretable.

Did this answer your question?