Importing an entire table to Pecan (a.k.a. “Full table import”) requires more time and computing resources than importing just a portion of the same table (a.k.a. “Import new rows”). This can make for a dramatic difference when working with extremely large tables, especially on a recurring basis.
Importing an entire table can be avoided through the use of a reference column. A reference column, which typically contains date or timestamp values, makes it possible to recognize new vs. pre-existing rows. This way, whenever you re-import data for a particular table, Pecan can instantly recognize which rows need to be imported – and which can be skipped.
When do you not need a reference column?
You should not add a reference column to tables that are based on dimensional data – that is, where there is a single row per customer, and thus a one-to-one correspondence between the customer and each field.
Such is the case in the sample table below:
Basic Phone + Internet
Unlimited Phone + Internet + Roaming
Therefore, any time you want to generate a new prediction, you will want to take a new snapshot of the entire table.
When do you need a reference column?
A reference column is useful when working with transactional tables. These tables contain an ever-growing record of transactions, meaning you will have multiple rows of data for certain customers, and thus a many-to-one relationship between each customer and other fields.
Such is the case in below sample table, where a separate row exists for each customer transaction.
Home stereo system
Since you’ll want to make predictions for an ever-growing data set, this table will need to be re-imported on a regular basis. “Transaction date” would serve as the reference column – enabling Pecan to recognize older records and thus skip re-importing them each time.
Why use a reference column?
When new transactional data needs to be fed into a model, this often amounts to many thousands or millions of rows – many of which have already been imported. By including a reference column, you will enable Pecan to recognize pre-existing rows and skip importing them. This achieves two things:
Dramatically speeds up the importing process
Helps you avoid hitting query limits from your data-service provider (and potentially incurring additional data expenses)