Understanding ETA (Entity, Target, Attribute)
Ori Sagi avatar
Written by Ori Sagi
Updated over a week ago

The goal of a Pecan model is to help you make better business decisions by leveraging predictions based on historical data. For that to happen, you first need to understand the business goal you’re trying to achieve. In Pecan, this is done in by clearly defining your predictive question.

Once your predictive question has been established, you need to present your data in a way that aligns with the question and enables machine-learning algorithms to know what to look for and measure. This way, your model will be able to recognize patterns between datasets – and use them to make predictions.

Three types of tables are required in order to train a Pecan model and make predictions on an ongoing basis:

You will generate these “ETA tables” by creating SQL queries in the Blueprint Editor. Pecan will then automatically join them into a flattened, AI-ready dataset so it can be fed into your model for training. This process is illustrated below:

How you construct your ETA queries, and thus instruct the model to interpret your data, will also determine how future datasets should be fed into your model so it can make predictions.

What is an Entity Table?

An Entity Table defines the examples or events you want to learn from. It answers two questions:

  • Who exhibits the behavior you want to predict?

    • e.g. all VIP users, all new US customers

  • When do you want to make a prediction?

    • This may be by a date in time (e.g. the 1st of every month) or a trigger-based event (e.g. 2 days after installation). In either case, when dealing with training data, it will be a date (or timestamp) in the past.

This information will be fed into your model via a table. Below is an example of an Entity Table. It includes customers we want to make predictions for, and when those predictions are to be made (known as marker dates).

customer id

marker

57380

2022-02-30 12:00:00

57380

2022-03-30 12:00:00

57380

2022-04-30 12:00:00

57380

2022-05-30 12:00:00

205013

2022-02-30 12:00:00

205013

2022-03-30 12:00:00

205013

2022-04-30 12:00:00

205013

2022-05-30 12:00:00

When a customer ID number is paired with a marker date, this gives us an “entity” – a single value that represents a particular customer at a particular point in time. (Behind the scenes, each entity will be assigned a unique “Pecan ID”, which will be used to join the ETA tables.)

As your model is trained, it will use what it has learned from your data to: 1) make predictions for the set of past entities, and 2) compare those predictions against what actually happened – that is, whether each customer exhibited the target behavior within a defined period after the marker date (a.k.a. the “prediction window”).

These results will determine whether your model will be selected among several others, and what the performance of the model is.

What is a Target Table?

A Target Table identifies the behavior that is to be predicted for each entity (where an “entity” is a particular customer at a certain point in time.)

It answers the question:

  • What behavior do you want to predict?

Your model might try to predict whether an activity will occur, the likelihood of that activity occurring, the amount of money that might be spent (dollar value), and so on.

In the below example of a Target Table, the target behavior expressed by each entity is located in the right-most column. The goal of the model would be to predict whether each entity will make a purchase within the next 30 days after the marker date. (In this case, 1 = “yes” and 0 = “no”.)

customer id

marker

made a purchase within 30 days of marker

57380

2022-02-30 12:00:00

1

57380

2022-03-30 12:00:00

1

57380

2022-04-30 12:00:00

0

57380

2022-05-30 12:00:00

0

205013

2022-02-30 12:00:00

1

205013

2022-03-30 12:00:00

0

205013

2022-04-30 12:00:00

0

205013

2022-05-30 12:00:00

0

For training purposes, this column will contain historical data – that is, the actual outcome for each entity. Your model will then train itself by comparing the predictions it makes for those entities against what actually happened. (Then, afterwards, the model will test its predictive performance on an untouched set of past data.)

What is an Attribute table?

Attribute tables enrich your records with relevant data that will be valuable for generating predictions. They answer the question:

  • With what information will we be able to answer the predictive question?

A wide variety of data may factor into the target behavior of a customer, such as their transaction history, gaming results, demographic information and NPS score. Indeed, a very large number of attributes may be involved in making a prediction. Machine learning is what enables you to detect patterns among such vast quantities of data, and to create features out of these columns.

Below is an example of an Attribute Table, which contains information about customer behavior and would help form the basis of a model’s predictions.

customer id

transaction date

category

country

campaign id

gender

transaction amount

57380

2022-02-28 08:43:07

Garden

Japan

4041379140569

Female

$11.57

57380

2022-03-14 21:25:00

Shoes

Japan

5010124916284297

Female

$68.22

57380

2022-04-06 06:43:10

Industrial

Japan

6378381216678255

Female

$5.13

205013

2022-02-11 12:13:02

Electronics

Chile

3529969244867448

Male

$25.77

205013

2022-05-26 17:01:18

Sports

Chile

3580247153934236

Male

$91.57

453785

2022-02-17 03:05:10

Electronics

China

201696962969079

Male

$12.49

565479

2022-02-17 03:06:55

Health

Canada

5100130473994344

Female

$32.95

Through automatic processes of feature engineering and feature selection, Pecan will uncover the features that may contribute to a model, and then determine whether the significance is high enough to generate useful predictions (see Understanding Feature Importance).

Pecan joins ETA tables into an AI-ready dataset

Once the Entity, Target and Attribute tables for your model have been defined, all of this customer data can be flattened into an AI-ready dataset.

By using the above examples, we can generate a sample input table:

customer id

marker

category

country

gender

transaction amount

purchased within 30 days of marker

57380

2022-02-30

Garden

Japan

Female

$11.57

1

57380

2022-03-30

Shoes

Japan

Female

$68.22

1

57380

2022-04-30

Industrial

Japan

Female

$5.13

0

57380

2022-05-30

null

Japan

Female

null

0

205013

2022-02-30

Electronics

Chile

Male

$25.77

1

205013

2022-03-30

null

Chile

Male

null

0

205013

2022-04-30

null

Chile

Male

null

0

205013

2022-05-30

Sports

Chile

Male

$91.57

0

As you can see, this table comprises:

  • A unique row for each customer at a particular point in time (where each row is an “entity”)

  • Multiple columns containing properties that are related to each entity (“attribute columns”)

    • In this example, we have both demographic data and transactional data, which will eventually give rise to specific model features.

  • A column that describes what we’re trying to predict (the “target”).

During Pecan’s machine-learning processes, all values will be represented as strings of numbers.

If/when multiple transactions occur for a single entity (a.k.a. customer ID + marker date), a unique “Pecan Number” will be assigned to each transaction so they can all be factored into the model.

Defining your ETA tables will enable Pecan to join them into a flattened AI-ready dataset, and to know what information to look for, learn from, and evaluate the impact of. So how do you get started? In Pecan, you’ll achieve all this by building your ETA queries with SQL.

Did this answer your question?