Amazon Redshift connector

Easily connect to your Amazon Redshift data warehouse with Pecan, so you can create models using your existing data.

Ori Sagi avatar
Written by Ori Sagi
Updated over a week ago

Redshift is a fully managed data warehouse service in the cloud, designed to handle large-scale data sets and database migrations.

Pecan imports data from Amazon Redshift by using the UNLOAD command, which exports your data to an S3 bucket so it’s available to Pecan.

This process is automatic and only requires Pecan to have access to the relevant tables and S3 bucket.

The relevant steps are described below. If you need help with them, be sure to consult with your internal IT or DevOps team.

Prerequisite steps

  1. First, make sure you whitelist these Pecan IP addresses:

    54.246.108.184
    34.254.11.37
    54.216.8.168
    54.217.189.54
    54.73.234.235
    54.170.135.39
    79.125.57.39
  2. You’ll need to create an S3 bucket for Pecan to use. You can do so by following AWS instructions for creating a bucket. (Note that the S3 bucket will need to be in the same region as your Redshift instance.)

  3. Next, you’ll need to create an IAM user with read and write permissions, so Pecan has access to the S3 bucket:

    1. Log in to AWS Identity and Access Management (IAM) and create an IAM user using these steps. Make sure to save the access key and secret key – you’ll need them to create a connection in Pecan.

      To retrieve the AWS access key or secret key of an existing IAM role, see:
      Understanding and getting your AWS credentials
      Resetting lost or forgotten passwords or access keys for AWS

    2. Attach the relevant IAM policy to the user so Pecan has “programmatic access” to make API calls to your AWS bucket.

      To do so, copy and paste the below JSON text to your policy console in AWS. Make sure to change the <BUCKET_NAME> placeholder in Lines 11, 22, and 31 to match the actual bucket name you created in Step 2.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PecanKeyReadPermissions",
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:List*"
            ],
            "Resource": "arn:aws:s3:::<BUCKET_NAME>"
        },
        {
            "Sid": "PecanKeyWritePermissions",
            "Effect": "Allow",
            "Action": ["s3:Get*",
                       "s3:List*",
                       "s3:Put*",
                       "s3:Delete*"],
            "Resource": "arn:aws:s3:::<BUCKET_NAME>"
        },
        {
            "Sid": "PecanBucketPermissions",
            "Effect": "Allow",
            "Action": ["s3:GetBucketLocation",
                       "s3:ListBucket"],
            "Resource": "arn:aws:s3:::<BUCKET_NAME>"
        }
    ]
}


To learn more, read about Policies and permissions in IAM.

4. Lastly, create a user for Pecan on your Amazon Redshift server by using the CREATE USER command. This will provide Pecan with read access to the tables that will be used to build your model.

If you’re not sure which tables will be needed, you can simply provide read access to all tables. You’ll be able to decide which tables to import when you import your data to Pecan.

How to configure an Amazon Redshift connection in Pecan

  1. Log in to Pecan, select the “Connections” tab, and click Add connection.

  2. Select “Amazon Redshift” and complete the following fields:

    • Connection name – This is how you’ll identify the connection when creating and working with models on the platform.

      • Names should be unique and reflect the data source and what’s stored in it.

      • Valid characters include letters, numbers, and underscores (e.g. “paid_downloads_fall_2021”).

      • Note that connection names can’t be changed once created.

    • Connection type – Select "Read" if the connection will be used to import data into Pecan. Select "Write" if it will be used to export predictions from Pecan to Amazon Redshift.

    • Endpoint – This is the address of your database (e.g. my_db.eu-west-1.redshift.amazonaws.com, 96.127.44.11).

    • Port – Provide the IP addresses of your server and connection port.

    • Database – This is the name of the database Pecan should use.

      • Servers can hold many databases. If you want to access data from multiple databases that sit on the same server, you’ll need to create a new Pecan connection for each database.

    • Username and Password - You will have created these on your Redshift server during Step 4 of the above prerequisite steps.

    • Temp bucket – The name of the bucket you created during Step 2 of the prerequisite steps (above), without any prefix (e.g. “my_s3_bucket” instead of “S3://my_s3_bucket”).

    • AWS access key – For the bucket you created during Step 2 above.

    • AWS secret key – For the bucket you created during step 2 above.

    • IAM role (optional) – Specify an IAM role to be assumed (see AWS documentation).

  3. Now, click Test connection to make sure everything is working correctly. Then click Create connection to complete the setup. (For more information, see Testing and creating a data connection.)

How to enable exporting predictions

Once a connection is set up, it can be used to export Pecan predictions back to your Amazon Redshift server.

To enable this functionality:

  1. Create an empty table on your Redshift server that will receive Pecan predictions using a name like “pecan_predictions_table”.

  2. Give the Pecan user (which you created in Redshift) write access to this table. The columns and values that will be written to it will be defined when you generate predictions for the first time.

Alternatively:

  1. Allow the Pecan user to create new tables, and Pecan will automatically create this table for you, defining the columns and inserting predictions.

  2. If you do this, there’s no need to give write access to the Pecan user.

Did this answer your question?