Once you've configured the connection between your data source and Pecan, it's time to import the actual data. Here’s how to do it:
1. Import metadata of the tables you want to use
Your data source may contain a number of tables, which will appear once you create a connection to Pecan.
Your goal is to import tables containing any relevant data that can be used to build and train a predictive model.
Note: you do not need to import all of your tables at once. You can always return to this screen and import additional data if needed.
On the below screen in Pecan, hover over each relevant table and click Fetch metadata to get the list of columns and their types:
2. Configure import settings
Once Pecan has the metadata of a table, hover over its line and click Import data:
The below dialog box will appear for each table you wish to import. Naturally, each table may contain anywhere from a handful to dozens of columns.
Select your desired columns
You’ll want to import any columns that may be useful to your model, and exclude any columns that contain negligible, sensitive or undesired data.
Select a reference column
When dealing with incremental data collection (collecting new data on a regular basis), importing an entire table (aka “taking a snapshot”) requires more time and computing resources than importing just a portion of one – a significant difference when working with extremely large tables.
This is avoided through the use of a reference column. A reference column contains values (typically a date or timestamp) that makes it possible to recognize which rows are new and which are not. This way, whenever you re-import data for a particular table, Pecan will instantly recognize which rows are new and need to be imported – and which can be skipped. To learn more, see What is a reference column and when do you need one?
If you do not select a reference column, you’ll be asked to confirm the decision, since the impact may amount to several hours of processing time the next time you import data from a particular table.
Currently, Pecan supports adding a reference column when you are importing data from a server or database – not from S3 Parquet files.
3. Advanced settings (if needed)
To save time and computational costs, you can tell Pecan to only import data after a certain date by selecting a start date.
Another option is to use data partitioning.
4. Import the data
Once you select a reference column and the columns to import, all that’s left to do is click Import data.
During this time, data from your tables will be imported to Pecan’s servers. This process may take several hours, during which you can navigate away and continue working as usual.
Once the process is complete, you will be able to see how many rows have been imported from each table, or whether there was an issue.
That’s it! You have successfully imported data to Pecan and can now start creating a predictive model.
If you will want to import new data on an ongoing basis (incremental data collection), you will be able to return to the “Schema” screen and click Import new data for any given table. Alternatively, you can click Import all data to re-import the entire table from scratch.