All Collections
Evaluating a Model
Legacy dashboard
Understanding Area Under the Curve (AUC)
Understanding Area Under the Curve (AUC)
Ori Sagi avatar
Written by Ori Sagi
Updated over a week ago

In Pecan, AUC (Area Under the Curve) is a key metric for evaluating the predictive performance of your model. It is derived by calculating the area under your ROC curve, which demonstrates the diagnostic ability of a classification model as its threshold is varied (“when the rate of false positives is X, the rate of true positives is Y.”)

As is illustrated above:

  • A high AUC indicates a well-performing model – a high rate of true positives (“Detected correctly”) and low rate of false positives (“Detected incorrectly”)

  • An AUC of 1.0 indicates perfect predictive performance – a suboptimal condition that's often the result of overfitting.

  • An AUC of 0.5 suggests that the model is not able to discriminate between classes, and is thus equivalent to making predictions at random.

From a data science perspective, AUC is considered the most robust metric since it looks at the model’s performance as a whole, without reference to a specific threshold. On the flip side, unlike metrics such as precision and recall – which are tied to a specific threshold – it cannot be correlated directly with business needs.

What should your AUC score be?

The ideal range for an AUC score in Pecan is generally 0.65-95. You can view this score in the dashboard for your Pecan model by opening the “Models” tab and clicking “Technical details” at the top of the screen.

Why should you aim for this range? Your model’s AUC is the result of a tradeoff between precision and recall. If the value is high, it means your model will return many results, with most results labeled correctly.

But if the score is too high (i.e. above 0.95), it will not generalize well to new data, and thus be unable to perform the predictions it was intended for. For example: a score of 1 indicates perfect reliability of your model – every instance of target behavior is predicted, and there are no “incorrectly detected” instances (a.k.a. false positives). In this case, the model’s close correspondence to your training data will actually impedes its ability to make predictions for future data. (For more information, see What is overfitting?)

Conversely, if the score is too low (i.e. below 0.65) this indicates poor predictive performance of your model – demonstrating a poor ability to predict instances and non-instances of target behavior. (For more information, see What is underfitting?)

Did this answer your question?