Transformations Overview

Integrator supports a large number of ETL transformations. Most of them can be configured without writing a single line of code.

Common Transformations

Source-to-Destination

This is the most common transformation. Integrator extracts data from the source and loads it into the destination. Read more about the source-to-destination transformation.

Transformation name

Each source-to-destination transformation has either an automatically-generated or user-assigned name. Read more about the transformation name.

Mapping

A mapping transformation allows the user to map fields in the source to fields in the destination. Read more about mapping

SQL transformation

In Integrator, you can use SQL when extracting data from files, API endpoints, social websites, as well as (obviously) databases. Read more about the SQL transformation.

Additional Transformations

Additional transformations can be used to filter, validate and modify a source dataset. Most of them are executed on each row of the dataset during the extract, but some can be performed after the extract on the entire dataset, or before the load.

Filter

It is possible to filter out certain rows of the source dataset using JavaScript. Read more about the filter transformation.

Note: this transformation is only available when the source is a file or API endpoint.

Order By

Integrator can reorder a source dataset using specified fields. SQL-like Order By expressions are supported.

Note: this transformation is only available when the source is a file or an API endpoint.

Order by

Validation

This transformation allows the user to configure the rules for rejecting a row or an entire dataset, or halt flow execution if validation fails. Read more about the validation transformation.

Remove Duplicates

When this transformation is engaged, Integrator checks to see if a record with the same values for the comma-separated list of "unique fields" has already been processed and ignores all other matching records.

Remove duplicates

JavaScript transformation

This transformation allows the user to configure the rules for changing a dataset on the fly.

Read more about the JavaScript transformation.

Partition By

Partitioning is simply splitting a larger destination dataset into smaller "partitions". For example, if Partition By is set to 100 and the dataset has 1,000 records, then 10 files, each having 100 records will be created. Or, if Partition By is set to last_name,first_name and the dataset has records with multiple, identical last and first names, then multiple files will be created, grouped by the same last and first name.

Partition By

Read more about the Partition By transformation.

Also, read how to configure the names of files when splitting a large destination dataset.

Note: this transformation is only available when the destination is a file.

Complex Transformations

Complex transformations are performed on the entire dataset, either right after the extract or before the load. When a complex transformation is engaged, streaming will be automatically disabled.

Extract Dimensions

When the source dataset is a complex nested data object, for example a response from a web service, it is possible to extract the inner dimension as a flat dataset.

Read more about the Extract Dimensions transformation.

Extract dimensions

Note: this transformation is only available when the source is a file or an API endpoint.

Transpose dataset

When you transpose a dataset, the rows become columns and the columns become rows.

Original

first,last
joe,doe
hey,jude
the,who

Transposed

column_0,column_1,column_2
joe,hey,the
doe,jude,who

Transpose a dataset

Wrap columns

When you apply this transformation, it leaves the Max # of Columns in a Row in one row, and moves the rest to the next row.

For example, when the Max # of Columns in a Row is set to 1:

Before

first,last
joe,doe
hey,jude
the,who

After

first
joe
doe
hey
jude
the
who

wrap columns transformation

Denormalize dataset

This transformation pivots data from multiple rows into columns in a single row.

Parameters

Columns to Group By - a comma-separated list of columns to group by. This parameter is a required field for the transformation.

Columns to Include - a comma-separated list of columns to include. For example: firstName, lastName, ssn, age, address, phone. address and phone are specified as the only columns to include. Then, ssn and age will be included in the final dataset.

Columns to Exclude - a comma-separated list of columns to exclude. For example: firstName, lastName, ssn, age, address, phone. age and ssn are specified as columns to exclude. Then, ssn and age will not be included in the final dataset.

Leading Column - the name of the leading column. The leading column sets the pattern for the other columns. For example, if the Address is a leading column, and there are 2 addresses for the given first+last name, but there are 4 phones, the phones will be included in the final dataset 2 times as well.

Max # of Columns - the maximum number of columns permitted. Use this parameter to set a limit on how many times denormalized columns can be repeated. For example, if there are 4 addresses for the given first+last name, and the limit is set to 2 - only address1 and address2 will be included in the final dataset.

For example, with Columns to Group By set to id:

Before

id,address,phone
1,main stree1,412111111
1,anderson dr,412111112
2,home,
3,home,home phone
3,work,work phone
3,,mobile phone

After

id,address,address_2,address_3,phone,phone_2,phone_3
1,main stree1,anderson dr,,412111111,412111112,
2,home,,,,,
3,home,work,,home phone,work phone,mobile phone

denormalize dataset

Normalize dataset as key-value pairs

This transformation converts columns to key-value pairs, where the key is a column name and the value is a column value.

Parameters

Columns to Convert - a comma-separated list of columns to convert to key-value pairs. For example: firstName, lastName, ssn, age. ssn and age are configured as columns to convert. Result: firstName, lastName, key, value, where keys are ssn and age, and values are values for these columns.

Key Column Name - name of the actual column which will be used as a Key.

Value Column Name - name of the actual column which will be used as a Value.

For example, with Columns to Convert set to type,name, Key Column Name set to key, and Value Column Name set to Value:

Before

id,type,name
1,first,joe
2,last,jude
3,last,who

After

id,key,value
1,type,first
1,name,joe
2,type,last
2,name,jude
3,type,last
3,name,who

key value normalization

Denormalize key-value pairs

This transformation groups a dataset by the specified fields, and converts the key-value pairs to columns and rows. The Key and Value are user-defined columns.

Parameters

Columns to Group By - a comma-separated list of columns to group by. This is a required field for the transformation. The Group by columns are automatically included in the final dataset.

Key Column Name - the name of the column which will be used as a key. Key Column Name is a required field for this transformation.

Value Column Name - the name of the column which will be used as a value. Value Column Name is a required field for this transformation.

Columns to Include - a comma-separated list of columns to include. These columns will be included in the final dataset, together with the Group By columns.

All Columns - an ordered comma-separated list of all columns in the final dataset. This field is optional.

For example, with Columns to Group By set to id, Key Column Name to phone_type, and Value Column Name to phone:

Before

id,phone_type,phone
1,cell,412111111
1,home,412111112
2,home,412111113
3,home,412111114
3,work,412111115
3,cell,412111116

After

id,cell,home,work
1,412111111,412111112,
2,,412111113,
3,412111116,412111114,412111115

key value denormalization

Unique Transformations

Unique transformations are either specific to a certain application (such as Snowflake), a standard (such as HL7), or a use-case (such as change replication).

Snowflake

Snowflake is a data warehouse built for the cloud. Typically, to load data into Snowflake, the following operations must be performed:

  1. Extract data from the source.
  2. Create files in the CSV or JSON format.
  3. Copy files into an Amazon S3 bucket.
  4. Execute the COPY INTO command on multiple files.

Integrator hides the complexity of loading data into Snowflake by introducing a collection of highly-configurable Snowflake flows.

Read more about Snowflake flows.

Change replication

In Integrator, it is possible to create flows which track changes in the database or API and load them into another database or API.

Read more about support for change replication in Integrator.

HL7 transformations

HL7 is a combination of the data exchange formats and communication protocols used to connect health care applications.

Integrator supports numerous high-level HL7 transformations.

Read more about support for HL7 in Integrator.