Feature selection is an important preprocessing step in building machine-learning models, and will now occur automatically whenever a model is trained.
Feature selection automatically detects and filters out redundant features that don’t contribute to your model’s accuracy (don’t provide meaningful information gain). By doing so, it reduces the number of features and data dimensions that need to be computed, and reduces training time by up to 20%.
Adding feature selection to the data pipeline provides four key advantages:
Saves compute costs and processing time
Tends to improve overall model accuracy
Creates space for additional features
Improves model explainability (how well features are able to explain predictions)
Pecan takes action by detecting four undesirable behaviors of features:
High null rate – removes columns that have a null rate over 95% and do not correlate with the target activity.
Correlated features – detects pairs of correlated features (over 95%) and selects the one that’s easier to interpret.
Extreme entropy – remove columns with extremely high or low entropy (above 99% or below 1%).
Entropy is a way to measure variance in a machine-learning model. In a healthy model, it will not be extreme. 100% variance would mean that all values in a column are different, and 0% would mean that all values in a column are the same. In either case, the column wouldn’t provide valuable information to the model.
Features with same entropy – detects columns that classify data in the same way and have the same effect on the model, and then selects the one that’s easier to interpret.
Technical details in your dashboard
When clicking the Technical details button in your dashboard, you’re now able to view the model’s most recent training time. This enables you to better estimate runtime for similar models in the future.
The “Next run” bug has been fixed, so you can now see the next scheduled date for generating predictions. Note that this value may be irrelevant if you have asked your Pecan Analyst to manually select a different date.
To create more consistency in language across our platform, the term “Holdout” has been changed to “Test Set” in model dashboards.
We have introduced tooltips to provide key information about each field.
Here is a view of updated content in the “Technical details” box:
Starting with a predictive question
Available only in beta
Building your model’s blueprint now starts with the basic step of defining your predictive question. This makes it easier to conceptualize the goal of your model and the data it will require. It can also help users understand and edit the blueprint even if they’re not SQL-proficient.
In an effort to better communicate technical issues with model training, we have made certain updates:
Pecan is now able to deliver different types of email notifications to different types of users, which enables us to customize the complexity of messaging for troubleshooting.
You’ll now see whether null values were found in the Entity during the training or prediction phase.
To help you understand and resolve issues more efficiently, links to related help content are now embedded within error messages in the platform. You’ll see them in the notification center, and, in cases where training has failed, in the system error message itself.
This is how errors (and now links) are displayed in the notification center:
Available only in beta
You can now view the details of your connection (such as access keys, root directory, etc.) under the “Connection” tab in the data import section