All Collections
Connecting Your Data
Methods of importing data to Pecan
Methods of importing data to Pecan

Import only new Rows, the full table, and overwrite - all to ensure accurate, efficient model training with easy method selection.

Ori Sagi avatar
Written by Ori Sagi
Updated over a week ago

Pecan offers a variety of connectors so you can easily connect different data sources to the platform. This enables you to create data connections in either direction: for importing data to Pecan, and for exporting predictions from Pecan.

Choosing the correct import method will ensure your model is fed the correct data and able to make predictions for the correct entities.

There are three methods for importing your data into Pecan:

How to select an automatic import method

The first time you import a table from the “Connections” screen, you’ll be asked to select an automatic import method. This will determines how new data is imported on an automatic basis if the table is used by a model that generates scheduled predictions.

Here’s what you’ll see the first time you click Import data:

After a table has been imported once, you’ll be able to adjust its automatic import method at any time (as shown below):

Three ways to import tables

Here, we explain how each method works and when it’s appropriate to use.

1. Import new rows

When this method is selected:

  1. All relevant data is imported the first time.

  2. Only new records will be imported on an ongoing basis. (Pecan checks for new data and adds only new rows to the existing table.)

This method is appropriate for tables that have new rows added on a continual basis, and whose earlier rows are never updated. Common examples of this include transaction history tables and customer activity tables. In such cases, not only do older rows not need to be deleted – they may still be relevant for making predictions.

When using this method to import data from a data service, you'll need to define a reference column. Typically a date or timestamp column, helps Pecan distinguish between new and pre-existing records – so certain rows can be imported, and others can be skipped. This saves time and computes costs when processing extremely large tables.

2. Full table import

When this method is selected, Pecan will import the entire table. Any existing rows for that table will be overwritten. In other words, any data that already exists on Pecan’s servers will be deleted and replaced with the new data.

This method is appropriate for tables that are frequently updated, where the date of each change is not consequential. Here are a few common examples:

  • A customer status table that includes information like membership status, demographics, etc.

  • A product information table that contains information like SKUs, pricing, etc.

  • Tables used to match user IDs from different systems (e.g. appsflyer_id and amplitude_id).

3. Overwrite

This method ensures Pecan has the most up-to-date records, so models are not being trained or generating predictions based on the wrong data.

It makes it easy to handle situations where tables are backfilled or modified after they have been imported to Pecan. This may happen for a variety of reasons, such as errors or downtime in systems that produce data, delayed transfers of data, or corrections being needed.

When this method is selected:

  1. Data beyond a particular date is deleted from Pecan’s servers.

  2. Pecan imports new rows beginning from that date (and continuing onward), adding them to the existing table.

If you choose this method, you’ll need to define how far back to overwrite data (as demonstrated in the below screenshot). If you select “30”, for example, 30 days' worth of data will be overwritten – starting from the most recent record in the table. (Note that each day starts/ends at midnight UTC.)

If using this method to import data from a data service, you'll need to define a reference column. This is typically a date or timestamp column, and is used an identifier for each activity. It serves as a reference so you can select how many days' worth of data to overwrite. (It also prevents re-importing rows unnecessarily, thereby saving time and compute costs when importing large tables.)

When importing hosted Parquet or CSV files, a standard date data-type partition is required for this feature to work. In this case, the partition date would be used to determine the days worth of data to be overwritten.

Note: this method is only available after a table has been imported for the first time.

How to change a table’s import method

You can always adjust a table’s automatic import method, as shown below.

Here’s how:

  1. Log into Pecan, go to “Connections” and select the connection that contains the table.

  2. Locate the relevant table and open its Settings dropdown menu.

  3. Click Select automatic import method and make your selection.

How to view the import history of a table

For any Pecan data connection, you can see the import (and export) history of each table. As you can see below, a history log shows which import method was used for what table, and when.

To view the import and export history for any connection:

  1. Log into Pecan, go to “Connections” and select your desired connection.

  2. Click the “History” tab.

Did this answer your question?