SHAP values (SHapley Additive exPlanations) are a way of quantifying the importance of each feature in determining the final prediction outcome of a machine learning model.
It is based on the idea that analyzing the outcome of each possible combination of features can determine the importance of a single feature on its own. Too complex? Let's walk through this last sentence.
As an example, we will imagine a machine learning model that predicts the price of an apartment knowing how old it is (Age variable) if it is currently furnished (Is_Furnished variable) and its location (Location variable).
Each node of the above tree contains the features used in the ML model. Each step down the tree adds another feature at our disposal to predict the final outcome.
In the following image, we have 8 paths to include all features for prediction in our model, e.g. one can first add Age, then Is_Furnished, then Location (traveling through 1-2-6-8, like in the previous illustration) or taking a totally different path, adding Is_Furnished, then Location, then Age (traveling through 1-3-7-8).
Now, in order to calculate SHAP values, one needs to train eight distinct ML models, one for every path in the above tree. Of course, in the end, all these models are completely equivalent to each other concerning our predictions. Only the order in which features were added has changed.
First, let's imagine that we are presented with no information about the apartment. We know that it is X years old, it is either furnished or not, and is situated at location Y, all variables still unknown to us.
Now, let's imagine we run this apartment through all 8 ML models at the same time in order to quantify its true cost.
Let's analyze what happens step by step.
At Level 0, we need to guess the apartment price without knowing Age, Is_Furnished, or Location. Our best guess at this point can be simply searching the internet for What is the average cost of an apartment? and then guess it for the current apartment we are analyzing. This is equivalent to us looking up the average salary on the Internet. But this is our starting point and we find that the average apartment price is $250k (box 1).
At Level 1, we are given the option of knowing one extra detail about the apartment: either Age or Is_Furnished or Location. Let's say we found out that the apartment is 28 years old. Logically, we should adjust our initial prediction of $250k to account for its age, guessing the average price of all apartments GIVEN that they are 28 years old. This brings our prediction down to $200k (box 2).
If instead of Age, say we discovered the apartment's location to be in a very highly-valued neighborhood. Our prediction for the apartment price would certainly increase, given that the average price of apartments in that posh neighborhood is more around $500k, which is what we guessed in box 4. If we knew both the apartment's age (28 y) and its location (posh neighborhood), we weigh the influence of both variables in predicting the final price, guessing something between $200k and $500k (box 6).
But wait! Why is our prediction in Box 6 is $425k and not the exact midpoint between knowing Age ($200k, in box 2) and knowing Location ($500k, in box 4)?
This is where the power of SHAP values steps in!
These two pieces of information, Age and Location, do not bring the same amount of information on the apartment's price. In this example, knowing the real estate's location is significantly more informative than knowing its age. Therefore Location has a greater SHAP value than Age.
The amount of information each variable gives at a single step might change depending on what we already know about the apartment. Going back to our example, we already know the apartment's Age and Location, discovering now if the apartment is furnished or not would impact our predictions by -$10k only (from box 6 to 8). However, if we only knew the apartment's Location and not its Age (box 4), discovering if it Is_Furnished would impact our predictions by -$25k (box 7). Even in relative (percentage) terms, the impact will be of a different magnitude.
This phenomenon indicates that, in order to quantify how much information each variable provides to our final model at Box 8, we need to quantify all possible interactions each variable had with each other along the way to reach Box 8.
That is precisely how SHAP values are calculated.
How to interpret SHAP values in Pecan
The feature importance widget displays the SHAP values on the PDP graph (Insight view of the right pane).
The graphs place the original feature values on the horizontal axis, and the SHAP values (or Effect On Prediction) on the vertical axis.
Click one of the most relevant features to display its Partial Dependency Plot (PDP).
PDPs help us to further understand how specific values of a feature impacted predictions. Therefore, it can help a business identify characteristics of problematic or successful behavior in their use case.
Imagine trying to predict Churn among a business' customers pool. You might know that days since the last transaction is a very informative feature and has a high SHAP value on Pecan, however, you do not yet know how many days since the last transaction impacts predictions; 3 or 4 days might not be that informative, while 7 days is. For these insights, Partial Dependency Plots are key.
For more information on what PDPs represent and how to read them, see Partial Dependency Plots (PDPs).