*SHAP* values (SHapley Additive exPlanations) are a way of quantifying the importance of each feature in determining the final prediction outcome of a machine learning model.

It is based on the idea that analyzing the outcome of **each possible combination of features can determine the importance of a single feature **on its own. Too complex? Let's walk through this last sentence.

As an example, we will imagine a machine learning model that predicts the price of an apartment knowing how old it is (*Age* variable) if it is currently furnished (*Is_Furnished* variable), and its location (*Location* variable).

**Each node of the above tree contains the features used in the ML model. Each step down the tree adds another feature at our disposal to predict the final outcome.**

In the following image, we have 8 paths to include all features for prediction in our model, *e.g.* one can first add *Age*, then *Is_Furnished*, then *Location* (traveling through 1-2-6-8, like in the previous illustration) or taking a totally different path, adding *Is_Furnished*, then *Location*, then *Age* (traveling through 1-3-7-8).

Now, in order to calculate *SHAP* values,** one needs to train eight distinct ML models, one for every path in the above tree**. Of course, in the end, all these models are completely equivalent to each other concerning our predictions. Only the order in which features were added has changed.

First, let's imagine that we are presented with no information about the apartment. We know that it is *X* years old, it is either furnished or not and it is situated at location *Y*, all variables still unknown to us.

Now, let's imagine we run this apartment through all 8 ML models at the same time in order to quantify its true cost.

Let's analyze what happens step by step.

At Level 0, we need to guess the apartment price without knowing *Age*, *Is_Furnished,* or *Location*. Our best guess at this point can be simply searching the internet for *What is the average cost of an apartment?* and then guess it for the current apartment we are analyzing. This is equivalent to us looking up the average salary on the Internet. But this is our starting point and we find that the average apartment price is $250k (box 1).

At Level 1, we are given the option of knowing one extra detail about the apartment: either *Age* **or** *Is_Furnished* **or** *Location*. Let's say we found out that the apartment is 28 years old. Logically, we should adjust our initial prediction of $250k to account for its age, guessing the average price of all apartments **GIVEN** that they are 28 years old. This brings our prediction down to $200k (box 2).

If **instead** of *Age*, say we discovered the apartment's *location* to be in a very highly-valued neighborhood. Our prediction for the apartment price would certainly increase, given that the average price of apartments in that posh neighborhood is more than around $500k, which is what we guessed in box 4. If we knew both the apartment's *age* (28 y) and its *location* (posh neighborhood), we weighed the influence of both variables in predicting the final price, guessing something between $200k and $500k (box 6).

But wait! Why is our prediction in Box 6 is $425k and not the exact midpoint between knowing *Age* ($200k, in box 2) and knowing *Location* ($500k, in box 4)?

**This is where the power of ***SHAP*** values steps in!**

These two pieces of information, *Age* and *Location*, do **not** bring the same** amount of information** on the apartment's price. In this example, knowing the real estate's *location* is **significantly more informative** than knowing its *age*. Therefore *Location* has a greater *SHAP* value than *Age*.

The amount of information each variable gives at a single step might change depending on what we already know about the apartment. Going back to our example, we already know the apartment's *Age* and *Location*, and discovering now if the apartment is furnished or not would impact our predictions by -$10k only (from boxes 6 to 8). However, if we only knew the apartment's *Location* and not its *Age* (box 4), discovering if it *Is_Furnished* would impact our predictions by -$25k (box 7). Even in relative (percentage) terms, the impact will be of a different magnitude.

This phenomenon indicates that, in order to quantify how much information each variable provides to our final model at Box 8, we need to quantify all possible interactions each variable had with each other along the way to reach Box 8.

That is precisely how *SHAP* values are calculated.

# How to interpret SHAP values in Pecan

The **feature importance widget** displays the SHAP values on the PDP graph (Insight view of the right pane).

The graphs place the original feature values on the horizontal axis and the SHAP values (or Effect On Prediction) on the vertical axis.

Click one of the most relevant features to display its Partial Dependency Plot (PDP).

PDPs help us to further understand **how** specific values of a feature impact predictions. Therefore, it can help a business identify characteristics of problematic or successful behavior in their use case.

Imagine trying to predict Churn among a business' customers pool. You might know that *days since the last transaction* is a very informative feature and has a high SHAP value on Pecan, however, you do not yet know how many *days since the last transaction *impacts predictions; 3 or 4 days might not be that informative, while 7 days is. For these insights, Partial Dependency Plots are key.

For more information on what PDPs represent and how to read them, see Partial Dependency Plots (PDPs).