Underfitting is the inverse of overfitting, meaning that your model is unable to make predictions due to an insufficient amount of training data (known as “Attributes” in Pecan).
Since the model has been trained on a dataset containing too little data, it’s unable to detect patterns in the data and identify effects that would be caused by it.
Underfitting is expressed by a low AUC (Area Under the Curve) value. A score of 0.5 means that your model will perform no better than leaving your predictions to chance; it will perform the same as the base rate.
How to identify underfitting in Pecan
In the dashboard for your Pecan model, check the metrics of your test set by clicking “Technical details” at the top of the screen.
If your “Holdout AUC” is below 0.65, this is a strong indicator of underfitting. This is considered a low number and indicates poor predictive performance of your model. The ideal range for a predictive model in Pecan is generally 0.65-0.95. (Note: “holdout data” is the 10% of your dataset that’s used to test the model once it’s trained and validated.)In Pecan, the model training process is divided into three stages: training, validation and testing. Your dashboard displays the evaluation metrics for your test set. If the metrics are low for your Precision Rate and Recall Rate, this indicates that the model didn’t manage to find patterns in the training set – and underfitting needs to be dealt with.
How to resolve underfitting
In Pecan, underfitting can be resolved by adding more training data to your model – by obtaining additional raw data, or by deriving new features from the current features (known as “/wiki/spaces/PHC/pages/2085683201”).
Since Pecan already trains several different models and chooses the best one, simply re-training your model without modifying your inputs won’t be effective.