In a Pecan model, the sensitivity threshold determines which part of your population is predicted to perform the target behavior, and which is predicted not to do so.
Say you’ve create a model that predicts whether customers are likely to churn within the next 90 days. Your model will generate a Pecan Score between 0 and 1 for each customer, which reflects their likelihood to churn relative to the rest of population. (It’s important to note that this score is not an absolute probability in itself.)
In Pecan, each prediction is placed along a curve, as illustrated below:
Now, in order to take action based on this data, you will need to choose a threshold that divides the population into having a score of either 0 (“not likely to churn”) or 1 (“likely to to churn”).
If you set the threshold at the 80th percentile, this means that 80% of the population will be assigned a 0 (predicted not to churn), and 20% will be assigned a 1 (predicted to churn). This is illustrated below:
Customers who are above the threshold (on the right side of it) are predicted to churn, while customers below the threshold (on the left side of it) are not.
Likewise, if you shift the threshold to the 40th percentile, 40% of the population will be assigned a 0, and 60% will be assigned a 1, as illustrated below:
What’s important is to remember is that your threshold does not affect the probability of each customer to actually perform the target activity.
What is does affect is whether you ultimately assign a positive or negative prediction to each of them.
In summary, the threshold is how you define the answer to the question: “If a customer’s Pecan Score is X, will we predict occurrence or non-occurrence of the target behavior within the defined time frame?”
How you answer that question will depend on your unique business needs, and also determine the business decisions made based on your model.
How threshold affects precision and detection
Another way to conceptualize threshold is in how it affects your model’s sensitivity to both correct and incorrect predictions, as communicated through the following metrics:
Precision Rate – the percentage of predictions of the target behavior that were correct
Calculated by: Detected correctly / (Detected correctly + Detected incorrectly)
In other words: True Positives / (True Positives + False Positives)
Detection Rate – the percentage of target behavior that was correctly identified
Calculated by: Detected correctly / (Detected correctly + Ignored incorrectly)
In other words: True Positives / (True Positives + False Negatives)
What happens when you choose a higher threshold?
A higher score will be required to predict an instance of the target behavior.
This will result in greater precision but lower detection.
This would be considered a more “conservative” threshold.
What happens when you choose a lower threshold?
A relatively lower score will be enough to predict an instance of the target behavior.
This will result in greater detection but lower precision.
This would be considered a more “liberal” threshold.
Adjusting your threshold in Pecan
You can easily adjust the threshold for any Pecan model. The below preconfigured options are provided by default in the dashboard for your binary model, as illustrated below:
Liberal – threshold is set at the 50th percentile (the top 50% of scores are predicted to complete the target behavior)
Neutral – threshold is set at the 30th percentile
Conservative – threshold is set at the 10th percentile
This is how your threshold setting appears in a model’s dashboard:
As explained in the above section, setting a more conservative threshold results in higher precision and lower detection – and a more liberal threshold accomplishes the opposite.
As you adjust your threshold in Pecan, you’ll see how it affects the prediction results and performance metrics in your dashboard, as well as the Venn diagram that represents those results.
How should you choose your threshold?
Your choice of threshold is a business decision, generally determined by the action(s) you plan to take based on the model’s predictions. There are certain industry benchmarks for choosing a threshold, but in general, the decision comes down to answering this question: “Are you more oriented towards precision or detection?”
To answer that, you’ll need to consider the following:
What is the cost of acting on a false positive (“Detected incorrectly” prediction)?
If the cost is relatively low, you would be less concerned about detecting them, and thus select a more liberal threshold.
If the cost is relatively high, you would want to avoid detecting them, and thus select a more conservative threshold.
What is the risk of not acting on a false negative (“Ignored incorrectly” prediction)?
If the risk is relatively high, you would want to avoid them, and thus select a more liberal threshold.
If the risk is relatively low, you would be less concerned about them, and thus select a more conservative threshold.
Example #1: precision over detection (the conservative case)
Say you create a model that predicts which customers are likely to churn within 60 days. Those who are predicted to churn will be contacted by your Customer Service Center about receiving a special loyalty offer. However, your service center is only capable of contacting 3,000 customers per month (resources are limited).
If a “neutral” threshold provides you with 15,000 positive predictions, you would want to raise the threshold to a higher level in order to minimize false positives (“Detected incorrectly”) and limit your business treatment to the 3,000 customers likeliest to churn. Therefore, precision would be prioritized over detection.
Case #2: detection over precision (the liberal case)
Say your health organization develops a model to predict whether patients are likely to develop a particular type of cancer. If they are predicted to do so, they would be invited to undergo a diagnostic test.
Because the cost of a missed diagnosis would be severe, you would choose a more liberal threshold (in order to minimize false negatives (“Ignored incorrectly”). Therefore, detection would be prioritized over precision. However, as the diagnostic test becomes more and more expensive, this may increase the importance of precision – and impact the placement of your threshold.
Conclusion
When the business treatment is costly, you would care more about precision and use a more conservative threshold. When the business treatment is less costly, or you are concerned about missing instances of target behavior, you would care more about detection and use a more liberal threshold.