Snowflake Overview

Snowflake is a data warehouse built for the cloud. Read more about Snowflake.

Integrator includes several prebuilt flows optimized for Snowflake.

Connecting to Snowflake

To connect to Snowflake:

Step 1. Open the Connections window, click the + button.

Step 2. Type snowflake into the Search field.

Snowflake

Step 3. Select a Snowflake connection and continue by defining the connection parameters.

Snowflake connection properties

Read more about configuring database connections.

Read more about configuring a connection to Snowflake using the JDBC driver.

Snowflake flows

Using Snowflake-specific flows you can extract data from any of the supported sources and load it directly into Snowflake.

A typical Snowflake flow performs the following operations:

  1. Extracts data from the source.
  2. Creates CSV or JSON files.
  3. Compresses files using the gzip algorithm.
  4. Copies files into an Amazon S3 bucket, used as an staging area for Snowflake.
  5. Checks to see if the destination Snowflake table exists, and if it does not - creates the table using metadata from the source.
  6. Executes the Snowflake COPY INTO command.
  7. Cleans up the remaining files, if needed.

Start creating Snowflake flows by opening the Flow Builder, clicking the + button, and typing snowflake into the search field:

Snowflake flows

Continue by selecting the flow type and entering the flow parameters:

Snowflake flow

Sources and Destinations

Depending upon the flow type, you can select one of the following sources (FROM) for the Snowflake flow:

For most of the Snowflake flows, the destination (TO) is going to be an Amazon S3 connection. Integrator uses Amazon S3 as a staging area for the files created during extract from the source (FROM).

Important: Your destination (TO) name is going to be the name of the file copied to the staging Amazon S3 bucket.

Note: when configuring a connection for Amazon S3, which will be used as a staging area for the Snowflake flows, it is recommended that you select GZip as the value for the Archive file before copying to S3 field:

Snowflake S3

Final Destination (Snowflake Connection)

For all Snowflake flows, the final destination is Snowflake.

To configure the final destination, click the Connections tab and select the connection created in Step 1.

Flow for a Snowflake connection

Mapping

If necessary, you can create a mapping between the source and destination (Snowflake) fields.

You create mapping between the source and destination just like you usually do for any other flow type.

Note: mapping is not required, but please remember that if a source field name is not supported by Snowflake, it will return an error and the data will not be loaded into the database. For example, if you are loading data from Google Analytics, the output (source) is going to include fields with the prefix ga: ( ga:user, ga:browser, etc. ). Unfortunately, Snowflake does not support fields with a : , so the data will be rejected. If that happens, you can use mapping to rename the destination fields:

Snowflake mapping

Parameters

When creating a Snowflake flow you must specify the required parameters, such as Snowflake Table Name. You can also add optional parameters, transformations and options:

Snowflake flow parameters

Depending on the flow type, other flow parameters can be added, as explained below:

Change Replication

As in any other flow type, it is possible to configure a change replication.

When change replication is enabled, only the changed records will be loaded into Snowflake.

Basically, all you need to do is set the high watermark field and enable change replication for the transformation:

Snowflake change replication

Flow examples

Extract from database - load into Snowflake

This flow extracts data from a relational database table and loads it directly into a Snowflake table.

Step 1. Create and test the Snowflake connection as explained here.

Step 2. Create a connection to the Amazon S3 bucket to be used as a staging area, as explained here. When configuring the S3 connection, select GZip as the value for the Archive file before copying to S3 field.

Step 3. Create a connection to the source, a relational database, as explained here.

Step 4. Start creating the Snowflake flow as explained here.

Step 5. Select "Extract data from database, load into Snowflake".

Step 6. Continue by defining the source and destination as explained here.

Step 7. For the source (FROM), select the connection that was created in Step 3 and enter the table name as a source (FROM). For example report.

Step 8. For the destination (TO), select the connection created in Step 2 and enter the name of the CSV file as a destination (TO). For example report.csv.

Step 9. Select the Snowflake connection created in Step 1 for the final destination, as explained here.

Step 10. Continue by defining the mapping as explained here.

Step 11. Continue by defining parameters as explained here. The only field you will need to enter data into is the Snowflake Table Name.

Step 12. Configure change replication, if needed.

Step 13. Save and test the flow.

Extract from Google Analytics - load into Snowflake

This flow extracts data from the Google Analytics API and loads it directly into a Snowflake table.

Step 1. Create and test the Snowflake connection as explained here.

Step 2. Create a connection to the Amazon S3 bucket that will be used as a staging area, as explained here. When configuring the S3 connection, select GZip as the value for the Archive file before copying to S3 field.

Step 3. Create the CSV file format as explained here.

Step 4. Create a connection to the Google Analytics API as explained here.

Step 5. Start creating the Snowflake flow as explained here.

Step 6. Select "Extract data from API, load into Snowflake".

Step 8. Continue by defining the source and destination as explained here.

Step 9. For the source (FROM), select the connection created in Step 4 and enter report as the source (FROM).

Step 10. For the destination (TO), select the connection created in Step 2 and enter report.csv as the destination (TO).

Step 11. Select the Snowflake connection, created in Step 1, for the final destination, as explained here.

Step 12. Continue by defining the mapping as explained here. In this particular case, you will need to rename the destination fields.

Step 13. Continue by defining the parameters as explained here. The only field you will need to enter data into is the Snowflake Table Name.

Step 14. Save and test the flow.

Load files into Snowflake

This flow loads CSV or JSON files directly into Snowflake. The files can be anywhere, so long as Integrator can read them: server storage, FTP, SFTP, cloud storage, etc.

Important: this is just a bulk load - files must already exist.

Step 1. Create a source connection which can be either file storage or cloud storage.

Step 2. Create and test the Snowflake connection as explained here.

Step 3. Create a connection to the Amazon S3 bucket that will be used as a staging area, as explained here. When configuring the S3 connection, select GZip as the value for the Archive file before copying to S3 field.

Step 4. Start creating the Snowflake flow as explained here.

Step 5. Select "Load files into Snowflake".

Load files into Snowflake

Step 6. Continue by defining the source and destination.

Step 7. For the source (FROM), select the connection created in Step 1 and enter a file name or a wildcard file name, for example *.csv.

Step 8. For the destination (TO), select the connection created in Step 3.

Step 9. Select the Snowflake connection, created in step 2, for the final destination, as explained here.

Step 10. Continue by defining the parameters. The only required parameter is Copy INTO SQL:

Copy INTO SQL

Step 11. Define the other parameters (the rest are optional):

Step 12. Save and test the flow.