The goal of a Pecan model is to help you make better business decisions by leveraging predictions based on historical data. For that to happen, you first need to understand the business goal you’re trying to achieve. In Pecan, this is done in by clearly defining your predictive question.
Once your predictive question has been established, you need to present your data in a way that aligns with the question and enables machine-learning algorithms to know what to look for and measure. This way, your model will be able to recognize patterns between datasets – and use them to make predictions.
Three types of tables are required in order to train a Pecan model and make predictions on an ongoing basis:
You will generate these “ETA tables” by creating SQL queries in the Blueprint Editor. Pecan will then automatically join them into a flattened, AI-ready dataset so it can be fed into your model for training. This process is illustrated below:
How you construct your ETA queries, and thus instruct the model to interpret your data, will also determine how future datasets should be fed into your model so it can make predictions.
What is an Entity Table?
An Entity Table defines the examples or events you want to learn from. It answers two questions:
Who exhibits the behavior you want to predict?
e.g. all VIP users, all new US customers
When do you want to make a prediction?
This may be by a date in time (e.g. the 1st of every month) or a trigger-based event (e.g. 2 days after installation). In either case, when dealing with training data, it will be a date (or timestamp) in the past.
This information will be fed into your model via a table. Below is an example of an Entity Table. It includes customers we want to make predictions for, and when those predictions are to be made (known as marker dates).
customer id | marker |
57380 | 2022-02-30 12:00:00 |
57380 | 2022-03-30 12:00:00 |
57380 | 2022-04-30 12:00:00 |
57380 | 2022-05-30 12:00:00 |
205013 | 2022-02-30 12:00:00 |
205013 | 2022-03-30 12:00:00 |
205013 | 2022-04-30 12:00:00 |
205013 | 2022-05-30 12:00:00 |
When a customer ID number is paired with a marker date, this gives us an “entity” – a single value that represents a particular customer at a particular point in time. (Behind the scenes, each entity will be assigned a unique “Pecan ID”, which will be used to join the ETA tables.)
As your model is trained, it will use what it has learned from your data to: 1) make predictions for the set of past entities, and 2) compare those predictions against what actually happened – that is, whether each customer exhibited the target behavior within a defined period after the marker date (a.k.a. the “prediction window”).
These results will determine whether your model will be selected among several others, and what the performance of the model is.
What is a Target Table?
A Target Table identifies the behavior that is to be predicted for each entity (where an “entity” is a particular customer at a certain point in time.)
It answers the question:
What behavior do you want to predict?
Your model might try to predict whether an activity will occur, the likelihood of that activity occurring, the amount of money that might be spent (dollar value), and so on.
In the below example of a Target Table, the target behavior expressed by each entity is located in the right-most column. The goal of the model would be to predict whether each entity will make a purchase within the next 30 days after the marker date. (In this case, 1 = “yes” and 0 = “no”.)
customer id | marker | made a purchase within 30 days of marker |
57380 | 2022-02-30 12:00:00 | 1 |
57380 | 2022-03-30 12:00:00 | 1 |
57380 | 2022-04-30 12:00:00 | 0 |
57380 | 2022-05-30 12:00:00 | 0 |
205013 | 2022-02-30 12:00:00 | 1 |
205013 | 2022-03-30 12:00:00 | 0 |
205013 | 2022-04-30 12:00:00 | 0 |
205013 | 2022-05-30 12:00:00 | 0 |
For training purposes, this column will contain historical data – that is, the actual outcome for each entity. Your model will then train itself by comparing the predictions it makes for those entities against what actually happened. (Then, afterwards, the model will test its predictive performance on an untouched set of past data.)
What is an Attribute table?
Attribute tables enrich your records with relevant data that will be valuable for generating predictions. They answer the question:
With what information will we be able to answer the predictive question?
A wide variety of data may factor into the target behavior of a customer, such as their transaction history, gaming results, demographic information and NPS score. Indeed, a very large number of attributes may be involved in making a prediction. Machine learning is what enables you to detect patterns among such vast quantities of data, and to create features out of these columns.
Below is an example of an Attribute Table, which contains information about customer behavior and would help form the basis of a model’s predictions.
customer id | transaction date | category | country | campaign id | gender | transaction amount |
57380 | 2022-02-28 08:43:07 | Garden | Japan | 4041379140569 | Female | $11.57 |
57380 | 2022-03-14 21:25:00 | Shoes | Japan | 5010124916284297 | Female | $68.22 |
57380 | 2022-04-06 06:43:10 | Industrial | Japan | 6378381216678255 | Female | $5.13 |
205013 | 2022-02-11 12:13:02 | Electronics | Chile | 3529969244867448 | Male | $25.77 |
205013 | 2022-05-26 17:01:18 | Sports | Chile | 3580247153934236 | Male | $91.57 |
453785 | 2022-02-17 03:05:10 | Electronics | China | 201696962969079 | Male | $12.49 |
565479 | 2022-02-17 03:06:55 | Health | Canada | 5100130473994344 | Female | $32.95 |
Through automatic processes of feature engineering and feature selection, Pecan will uncover the features that may contribute to a model, and then determine whether the significance is high enough to generate useful predictions (see Understanding Feature Importance).
Pecan joins ETA tables into an AI-ready dataset
Once the Entity, Target and Attribute tables for your model have been defined, all of this customer data can be flattened into an AI-ready dataset.
By using the above examples, we can generate a sample input table:
customer id | marker | category | country | gender | transaction amount | purchased within 30 days of marker |
57380 | 2022-02-30 | Garden | Japan | Female | $11.57 | 1 |
57380 | 2022-03-30 | Shoes | Japan | Female | $68.22 | 1 |
57380 | 2022-04-30 | Industrial | Japan | Female | $5.13 | 0 |
57380 | 2022-05-30 | null | Japan | Female | null | 0 |
205013 | 2022-02-30 | Electronics | Chile | Male | $25.77 | 1 |
205013 | 2022-03-30 | null | Chile | Male | null | 0 |
205013 | 2022-04-30 | null | Chile | Male | null | 0 |
205013 | 2022-05-30 | Sports | Chile | Male | $91.57 | 0 |
As you can see, this table comprises:
A unique row for each customer at a particular point in time (where each row is an “entity”)
Multiple columns containing properties that are related to each entity (“attribute columns”)
In this example, we have both demographic data and transactional data, which will eventually give rise to specific model features.
A column that describes what we’re trying to predict (the “target”).
During Pecan’s machine-learning processes, all values will be represented as strings of numbers.
If/when multiple transactions occur for a single entity (a.k.a. customer ID + marker date), a unique “Pecan Number” will be assigned to each transaction so they can all be factored into the model.
Defining your ETA tables will enable Pecan to join them into a flattened AI-ready dataset, and to know what information to look for, learn from, and evaluate the impact of. So how do you get started? In Pecan, you’ll achieve all this by building your ETA queries with SQL.