Transformation
Source to destination transformation
A typical transformation includes a source (FROM) and a destination
(TO). The source and destination can be any data source supported by
Etlworks: tables in a relational database, SQL statements,
files, web services, API endpoints, or social media websites.
|
Documentation
|
Where to use
- Use this transformation to extract data from the source, transform
it and load it into the destination
|
Transformation
Mapping
The drag-and-drop
mapping editor allows mapping sources to destinations and source
columns to
destination columns. The data engineer can configure per-column
transformations and
map source data types to destination data types.
|
Documentation
|
Where to use
- To map source to destination
- To map source columns to destination columns
- To configure per-column transformations
- To map source data types to destination data types
|
Transformation
SQL transformation (source query)
Use SQL to extract data from relational databases or transform data by running SQL on unstructured or semi-structured data sources.
|
Documentation
|
Where to use
- To extract data from the relational databases or transform data by
running SQL on unstructured or semi-structured data sources, such
as a file, a response from the API, a data feed from the social
network, etc.
- To flatten nested datasets
|
Transformation
Set operations
Set operations allow the results of multiple queries to be combined
into a single result set. Set operators include JOIN, UNION,
INTERSECT, and EXCEPT.
|
Documentation
|
Where to use
- To aggregate data from multiple data sources
|
Transformation
Filter
In Etlworks it is possible to filter out certain rows of the source
dataset using simple expression.
|
Documentation
|
Where to use
- To filter out certain rows of the source dataset using simple
expression or JavaScript
|
Transformation
Order-by
Etlworks can reorder a source dataset using specified fields.
SQL-like Order By expressions are supported.
|
Documentation
|
>Where to use
- To reorder a source dataset using specified fields. SQL-like Order
By expressions are supported
|
Transformation
Validation
This transformation allows the user to configure the rules for
rejecting a row, entire dataset, or halting the Flow execution if
validation fails.
|
Documentation
|
Where to use
- To reject rows and entire datasets based on configurable rules
|
Transformation
Remove duplicates
When the Remove Duplicates transformation is configured,
the flow checks to see if a record with the same values as the
comma-separated list of unique fields has already been processed,
and ignores all other matching records.
|
Documentation
|
Where to use
- To remove duplicated rows
|
Transformation
Partition-by
Use this transformation to split a larger dataset into smaller
partitions.
|
Documentation
|
Where to use
- To split by the maximum number of records in the partition
- To split by unique values of partition-by fields
|
Transformation
Extract dimensions
This transformation extracts the flat dimension from the complex
nested dataset.
|
Documentation
|
Where to use
- To flatten nested datasets
|
Transformation
Transpose
When you transpose a dataset, the rows become columns and the columns
become rows.
|
Documentation
|
Where to use
|
Transformation
Wrap columns
When you apply this transformation, it leaves the Max # of Columns in a Row in one row and moves the rest to the next row.
|
Documentation
|
Where to use
- To wrap extra columns to the next row
|
Transformation
Denormalize (pivot)
This transformation pivots data from multiple rows into columns in a
single row.
|
Documentation
|
Where to use
- To pivot data from multiple rows into columns in a single row
|
Transformation
Normalize dataset as key-value pairs
This transformation converts columns to key-value pairs, where the
key is a column name and the value is a column value.
|
Documentation
|
Where to use
- To converts columns in a dataset to key-value pairs
|
Transformation
Denormalize key-value pairs
This transformation groups a dataset by the specified fields, and
converts the key-value pairs to columns and rows. The Key and Value
are user-defined columns.
|
Documentation
|
Where to use
- To group a dataset by the specified fields, and convert the
key-value pairs to columns and rows
|
Transformation
Scripting transformations
The scripting transformation is used to change a data set on the fly.
You can use JavaScipt or Python in the scripting transformations.
|
Documentation
|
Where to use
- To transform the dataset on the fly using JavaScipt or Python
|
Transformation
Create nested documents using scripting language
This transformation provides an API for creating complex nested
documents.
|
Documentation
|
Where to use
- To create complex nested JSON, XML, Avro and Parquet documents
|
Transformation
Preprocessor
This transformation is used to transform raw source messages using a scripting language.
|
Documentation
|
Where to use
- To transform raw source message using a scripting language
|
Transformation
Randomization and Anonymization
Anonymization is a data processing technique that removes or modifies
personally identifiable information; it results in anonymized data
that cannot be associated with any one individual.
Randomization is the process of making something random; for
example, generating random numbers.
Etlworks supports both techniques via scripting (JavaScript and
Python).
|
Documentation
|
Where to use
- To apply anonymization or/and randomization algorithm to the value
of the source column
|
Transformation
XSL
Transform one or multiple XML files by applying the XSL style sheet
|
Documentation
|
Where to use
- To transform one or multiple XML files by applying the XSL style
sheet
|