The User Interface

Running Integrator in multiple browser tabs

Problem: can I open Connections and the Flow Builder at the same time, in different browser tabs?

Solution: On the left navigation bar, right click on the App icon (Connections, Flow Builder, etc.) and select Open link in new tab. You can open as many tabs as you need to.

Organizing flows, connections and schedules

Problem: we have more than 100 flows. Is there any way to organize them into some sort of "folder" arrangement, as in a file system?

Solution: use tags to organize flows, connections, formats, listeners and schedules.

Sticky filters

Problem: even after we have tagged most of our flows, it is still difficult to navigate through them unless we filter out some of them. Unfortunately, the filters are reset whenever we navigate between pages or log out of Integrator.

Solution: use a sticky filter to "remember" your full text and tag filters.

Collapsible tags

Problem: is there any way to display only the tags in the grids, such as in Connections, Flows, etc?

Solution: enable expandable tags, under the Welcome link.

Disable notifications

Problem: we have tons of flows running at the same time, so the Flow executed successfully notifications can become really annoying. Is there any way to disable them?

Solution: Click disable flow execution notifications, under the Welcome link.

Parameterization

Parameterization is a way to dynamically configure connections and transformations based on input parameters. The input parameters can be global variables or flow variables.

Global Variables

Global variables are the key-value pairs that can either be set by JavaScript code or set automatically, when executing data integration flows.

Examples of global variables that are set automatically, when executing data integration flows:

Source filename

Destination filename

Database loop parameters

Flow Variables

Flow variables are passed as URL parameters or URL variables to the user-created API endpoints.

Flow variables can also be defined when creating nested flows.

Flow variables can be used in source and destination queries or accessed programmatically.

Referencing Variables

To reference a global variable, use the following syntax: {variable name}.

Important: the value of the variable must not be empty. When Etlworks Integrator replaces the{variable name} with an actual value, it ignores null and empty values.

Examples:

Input: http://host.com?startDate={stateDate}
Variable-value: startDate-null
Output: http://host.com?startDate={stateDate}
Input: http://host.com?startDate={stateDate}
Variable-value: startDate-""
Output: http://host.com?startDate={stateDate}
Input: http://host.com?startDate={stateDate}
Variable-value: startDate-today
Output: http://host.com?startDate=today

Parameterization of the connections

Almost all connection parameters can be configured using {tokens}. Just enter the {variable name} into any field. See examples here: dynamic URLs.

Parameterization of transformations

FROM (source) and TO (destination) fields for a particular transformation can be configured using variables. Just enter the {variable name} anywhere within the FROM/TO field.

variable filename to write

Parameterization of the source and destination queries

You can use {tokens} anywhere in the source and destination queries.

SQL with parameters

In addition to global variables and flow variables, the destination query can contain {source field names}.

Working with files

Server storage

Problem: most of our files are in the cloud (or remote file storage: FTP, FTPS, SFTP, etc.). Is there any way to copy/move them to some sort of "local" storage (hard drive or local network) so that Integrator can access them directly?

Solution: just create a Server Storage connection and use it to work with the files.

Server storage

When creating a Server Storage connection you typically specify:

A Directory - a folder name. For example: /usr/local/test. Read more about the concept of a home folder.

File(s) - filename or a wildcard filename. For example: user.json or *.json.

app data

Note: Integrator concatenates Directory and File(s) to create an actual full path to the file(s). For example: /usr/local/test/user.json.

Read more about the Server Storage connection.

Important: when using Integrator in a cloud, each customer is given personal hard drive space. A customer is free to create subfolders, but the "Server Storage" connection is limited to folders under the Home Folder. If you install Integrator on premises, you can access files on any available network or local storage device.

Working with wildcard filenames

Problem: when I create a flow which reads a file and loads data into the database, web service, or another file, if I don't know the exact source filename upfront, is it possible to use a wildcard filename, such as *.xml, myfile??.*, etc?

Solution: In Integrator, you can use wildcard filenames for almost any type of source connection. All you need to do is specify the order in which the files are to be processed.

Important: by default, Integrator processes only one file at a time. For example, if there are multiple files which match the wildcard *.xml, and oldest has been selected as the algorithm for wildcard file processing, Integrator will choose to process the oldest file in the folder which matches the wildcard. Read how to process all files that match a wildcard filename.

Step 1. When creating a connection, select one of the available algorithms for wildcard file processing using the Enable Wildcard File Name field:

Enable a Wildcard File Name

Step 2. Create a flow where the source (FROM) is a file and the destination (TO) is a database, another file or a web service. When creating the transformation, enter a wildcard filename, for example Inbound*.csv, into the FROM field.

Flow with wildcards

Source filename

In a FROM/TO transformation, the following rules are used to determine the source (FROM) filename:

Accessing a source filename set as a variable

Read how to access a source filename set as a global variable.

Accessing a destination filename set as a variable

Read how to access a destination filename set as a global variable.

Destination filename is the same as the source filename

Problem: I'm processing files using a wildcard filename. How can I make sure that the destination filename is the same as the source filename.

Solution: enter * into the TO field for the transformation.

Destination filename equals the source filename

Calculate the destination filename

In a source-to-destination transformation, if any part of the TO field contains the macros replace(regex, replacement) the system will perform a regular expression replacement of the enclosed part of the string, and will concatenate it with the left and right parts of the string.

Read more about calculating a destination filename.

Creating files with unique names

Problem: I want to make sure that each time a file is created by a flow, it has a unique name.

Solution: in Integrator, you simply specify what suffix should be used when a new file is created.

Important: once suffixes are enabled for the connection - all files created using this connection will have the specified suffix.

Step 1. When creating a connection, select one of the available types of suffixes, using the Add Suffix When Creating Files field:

Unique filenames

Step 2. You are done. All files created using this connection will have the specified suffix.

File Template

Problem: I'm creating a flow where the destination is a file. I'd like to use a visual designer to map the source columns to the destination columns. However, there is no destination file yet (it is about to be created but doesn't exist yet). Is there any way to create a template, which will be used to populate the destination columns?

Solution: when configuring any file format, use the Template field to specify columns and data types. For example (for CSV): first_name,last_name,dob.

File template

Important: for the template, you must use the same file format as in the designated file: JSON for JSON files, CSV for CSV files, etc. Integrator is actually parsing the template using the parameters configured for that format.

Modify content of any document before passing it to the transformation

Problem: I would like to change the content of my XML, JSON or CSV document before passing it to the transformation as a source. For example, the source document is totally misformatted and requires preprocessing before Integrator can read it as standard XML, JSON or CSV.

Solution: you will need to select Preprocessor as the Transformation type and use JavaScript to modify the original message.

Preprocessor

Step 1. Create an XML, JSON or CSV format.

Step 2. When creating the format, select Preprocessor as the Transformation type.

Step 3. Enter JavaScript code into the Preprocessor field which will modify the source document. You can reference the original source message using the message variable.

message.substring(12);

Important: you can use a mishmash of JavaScript and Java methods to modify the original message.

Process the file and then delete it

Problem: I'd like to create a flow which transforms the source file (such as changing it into a different format) and then deletes it, if it has been processed successfully.

Solution: you will need to use JavaScript, a Delete files flow and a Nested flow to combine it all together.

Step 1. Create a source connection.

Step 2. Create a flow which reads the source file and transforms it to any destination, as you normally would.

Step 3. When creating the flow, select a transformation, click the Mapping button and open the Parameters tab.

Step 4. On the Parameters tab, scroll down and check the Execute Transformation Rules after extract checkbox.

Step 5. Enter the following JavaScript code into the Transformation Rules field.

importPackage(com.toolsverse.config);

SystemConfig.instance().getProperties().put('json2csv', FilenameUtils.getName(dataSet.getFileNameToRead()));

Note: this code stores the source filename under system properties.

Important: use any unique key in place of 'json2csv'.

Memorize the source filename

Step 6. Start creating a Delete file flow. Just type delete file into the Select Flow Type dialog box.

Step 7. When creating a Delete file flow, select the connection created in Step 1, and enter a curly-brackets{} enclosed key, which was created in Step 5, into the FROM field. For example: {json2csv}.

Delete a file flow

Step 8. Create a nested flow which combines the flows created in Steps 2-5 and Steps 6-7.

Process all files in a folder (built-in)

Problem: how can I create a flow which will process all the files in a folder. The simplest way possible, please.
Solution: just create a flow with a transformation which supports the Process all files option.

Let's assume that our source folder contains thousands of CSV files that we just want to load into the database.

Step 1. Create a connection for the source files. Make sure the connection supports wildcard filenames.

Step 2. Create a destination database connection.

Step 3. Create a flow that reads a CSV file and loads it into the database. When creating a transformation, enter the same wildcard filename into the FROM field as you specified in the source connection. You can have more than one transformation in the flow.

All files flow

Step 4. Click the MAPPING button and select the Parameters tab. Check Process all files.

Process all files option

Important: files will remain in the source folder after processing, so you might want to consider deleting or moving them to a different folder.

Process all files in a folder (using a wildcard filename)

Problem: how can I create a flow which will process all the files in a folder. I want it to be simple enough but still be able to execute complex flows (as opposed to a single transformation, as in the example above).
Solution: use a nested flow with the Loop type set to Files by wildcard.

Step 1. Create a connection for the source files. Make sure the connection supports wildcard filenames. Store the name of the connection.

Step 2. Create a flow where the source connection is the connection created in Step 1 and the destination is whatever you like.

Step 3. Create a nested flow and add the flow created in Step 2.

Step 4. Click the Pen icon next to the flow name. The Parameters pop-up window will open.

Wildcard loop

Step 5. Select Files by wildcard for Loop type.

Step 6. Select the connection created in Step 1 for Connection.

Step 7. Enter a wildcard filename, such as *.csv for Files path .

Important: files will remain in the source folder after processing, so you might want to delete or move them to a different folder.

Process all files in a folder (JavaScript)

Problem: how can I create a flow which will process all the files in a folder, one-by-one, and then will delete the processed files? I'm aware about the simple and wildcard options but I need more control.

Solution: you will need to create a nested flow which runs another flow in a loop until there are no more files left in the folder.

Let's assume that our source folder contains a thousand JSON files that we want to compile into a single CSV file. The goal is to convert all JSONs in the folder into CSVs and then merge the CSVs into a single file.

Step 1. Create a connection for the source files. Make sure that the connection supports wildcard filenames. Store the name of the connection.

Step 2. Create a connection for the destination files. Make sure the connection supports unique filenames. Important: when choosing an algorithm for creating unique filenames, use either uuid or milliseconds. Timestamp-based algorithms can cause multiple files to be created with the same name. Reason: the precision of the algorithm is one second, which may be slower than the speed the files are being created in.

Step 3. Create a flow, which reads a JSON file, creates a CSV file and deletes (or moves) the source file. Important: the key here is to delete (or move) the source file after it has been processed. Follow these instructions to create a flow which processes the source file and than deletes or moves it to a different folder.

Step 4. Create a nested flow which includes the flow created in Step 3.

Step 5. When creating a nested flow, click the Pen icon.

Process all files

Step 6. Select Files by wildcard as the Loop type.

Step 7. Define the loop conditions.

Define the loop conditions

Step 8. When defining the loop conditions, make sure you use the connection name from Step 1:

importPackage(com.toolsverse.etl.common);
importPackage(com.toolsverse.etl.core.task.common);

// FileManagerTask.filesExist(...) checks to see if there are any files in the folder 
// represented by the connection (alias), which matches the wildcard filename. 
// 'Multi source' is the connection name from Step 1.
// The loop will continue while the return value is not null.
value = !FileManagerTask.filesExist(etlConfig, 'Multi source', '*.json') ? null : "true"; 

Step 9. Optionally define the Maximum number of iterations. This parameter is designed to prevent the system from creating infinite loops when the loop conditions are not properly defined. The default value is 10,000. By setting this value to <= 0 (for example, to -1) you can disable the prevention mechanism altogether, so be careful.

Step 10. Create a flow, running in a loop, which merges the CSV files, created by flow 3, into a single file. Start creating the flow by typing merge CSV into the Select Flow Type dialog window.

Merge CSV files

Step 10. When creating a merge CSV flow for the source and destination, use the connection created in Step 2. Use the same format you used for the flow created in Step 3. Your FROM is going to be the wildcard filename for the CSV files created by flow 4, and your TO is going to be the name of the CSV file compiled by merging the original CSV files.

Merge FROM / TO

Step 11. Create a flow which deletes CSV files.

Delete files

Step 12. Create a nested flow, which combines the flows created in Step 4 (convert JSON to CSV, repeat until there are no files), Step 8 (merge CSV files) and Step 10 (delete the original CSV files).

Splitting the datasets into multiple files based on the value of a column

Problem: Is there any way to split a file into multiple files, based on the value of a column? For example, I'm creating a CSV file where the first column is ORDER ID. There might be multiple rows with the same ORDER ID. I want to split my CSV file so each new file has the same ORDER ID.

Solution: enter a comma-separated list of fields to split by, into the Partition By field. Using the example above, Partition By = ORDER ID.

Important: the files will be created using the following algorithm: original filename + _ + value of the column to partition by + original file extension.

partition by

Note: you can split the file into any of the formats supported by Integrator.

Using the column value as a destination filename, when splitting datasets into multiple files

Problem: we used the Partition By transformation to split a dataset into multiple files. Unfortunately, it created files with names that included the original destination filename, for example: order_1234.csv, where 1234 is the value of the Partition By column. What we really need is just 1234.csv. Is there any way to do this?

Solution: enable the Ignore Original File Name property under Mapping/Additional Transformations/Partition.

ignore original filename

Working with JSON files

Working with large JSON files

In Etlworks Integrator, it is possible to work with very large files. Read how to split files and, specifically, split JSON files.

Working with CSV files

Working with large CSV files

In Etlworks Integrator, it is possible to work with very large files. Read how to split files and, specifically, split CSV files.

Reading CSV files with multiline records.

It is possible that your source CSV file has records (rows) that span multiple lines.

first,last,description,dob
Joe,Doe,"just a
guy","0102000"
Brad,Pit,"an actor","0101965"

As you can see, for some of the records, the value of the description field occupies more than one line.

To handle this, just enable Has multiline records when configuring the CSV format.

enable multiline records

Reading non-standard CSV files.

It is possible that your source CSV file is not properly formatted or does not conform to the CSV format standard.

Problem: there are empty lines:

first,last,age
john,doe,23
...empty line...
mad,max,45

Solution: when configuring the CSV format, select the Skip Empty Rows check box.

Problem: the header does not conform to the CSV format:

just a header
john,doe,23
mad,max,45

Solution: when configuring the CSV format, select the Skip First Row check box and also (typically) Use First Row for Data.

Problem: some of the lines in the file have a different number of columns:

first,last,age
john,doe,23
wrong,number,of,columns
mad,max,45

Solution: when configuring the CSV format select the Skip not Properly Formatted Rows check box.

Problem: the header has less columns than data rows:

first,last
john,doe,23,01/01/2000
mad,max,45,01/03/1977

Solution: when configuring CSV format select the Document has extra data columns check box and deselect Use First Row for Data.

Using filters with CSV files

Problem: the source CSV file has a completely irregular structure and cannot be parsed by a standard parser, no matter what we do. In the example below: the first line must be ignored, the second line contains a "header" with one column, while lines from the third downward contain "items" with a different number of columns:

HeaderInfo
XYZ 2|LLL001|MMM0124386|BULK|17/01/11|0000002205|TIGER WHEEL|SHOPPING CENTRE BRITS NORTH WEST|0250|0122527777|2|
77777PYT12P515012MBK|TEST 9.0X17 5/150 ET12|4|5013562|16/12/29|3|554981
77777ROG20P611466MBK|TEST 9.0X17 6/114 ET20|4|5013562|16/12/29|6|554981

Solution: use a Filter when configuring the CSV format.

Example of a Filter

Let's assume that the source file has a structure like the one above. Let's also assume that we need to read the CSV file and split it on a "header" which contains line number 2:

XYZ 2|LLL001|MMM0124386|BULK|17/01/11|0000002205|TIGER WHEEL|SHOPPING CENTRE BRITS NORTH WEST|0250|0122527777|2|

and "items", from line 3 downward look like this:

77777PYT12P515012MBK|TEST 9.0X17 5/150 ET12|4|5013562|16/12/29|3|554981
77777ROG20P611466MBK|TEST 9.0X17 6/114 ET20|4|5013562|16/12/29|6|554981

The end-goal is to parse the source file and store the "header" part of the file in an order table in the database and the "items" part in an item table.

To parse files like the one above, we will create two different CSV formats, one for the "header" and one for the "items".

Step 1. Create a format for the "header":

if (filter.getIndex() > 1) {
   filter.setCode(TextFilter.TextFilterCode.STOP);
}

Step 2. Create a format for the "items":

if (filter.getActualIndex() == 1) {
   filter.setCode(TextFilter.TextFilterCode.REJECT);
   filter.setIndex(1);
}

Step 3. Follow this example to load data into 2 tables from a single source file.

Using JavaScript in a Filter

In the example, above we used the variable filter to 1) stop parsing the file and 2) skip lines in the file.

The following getters and setters (accessors and mutators) are available for the filter object:

// Sets the code which controls the parsing: 
// Possible code values: 
// TextFilter.TextFilterCode.STOP - stop the parsing
// TextFilter.TextFilterCode.REJECT - skip the line
// TextFilter.TextFilterCode.CONTINUE - continue parsing, default value
filter.setCode(TextFilter.TextFilterCode code); 

// Returns the current line number.  
filter.getIndex();

// Sets the current line number.  
filter.setIndex(index);

// Returns the actual, current line number. It is different from the line number if Use first Row for Data is enabled  
filter.getActualIndex();


// Returns the current line.  
filter.getLine();

// Sets the current line.  
filter.setLine(line);

Creating CSV files with a non-standard header

Problem: I need to create a CSV file with a header different from the data section. For example, the header (first line) has 5 comma-separated fields while the data section (lines below first line) has 15.

Solution: create a CSV file without a header, then use a Merge CSV Files flow to add a non-standard header.

Step 1. Create a CSV Format without a header: "Use First Row for data" must be enabled.

CSV without header

Step 2. Create a flow, which creates a CSV file with the format created in Step 1.

Step 3. Create a Merge CSV Files flow where the source is the file created in Step 2 and the destination is a file in the desired CSV format.

Adding a custom header to a CSV file

Problem: I need to create a CSV file with a custom header. For example, to submit data into Amazon MWS, a header with metadata (such as template version) must be added at the beginning of the file, followed by an end-of-line character, followed by the actual data.

Solution: when configuring the CSV format, set the Transformation type to header and specify the Header parameter.

Add header to CSV

Reading CSV file with a header not in the first line

Problem: I have a CSV file and the first, say 1 or 2 lines belong to a custom header. How do I skip those lines and start at line 3, since that line is the actual column header?

Solution: when creating the CSV format, enter the number of the row where the header starts, into the Start row field. Optionally enter the number of the last row to read, into the End row field.

CSV start row

Column names with special characters

Problem: My Excel worksheet (or text file) contains columns with special characters. For example: First name??%#abc. Is there any way to remove them when extracting data without manual mapping?

Solution: when configuring Excel, CSV or a Fixed text format, set Column names compatible with SQL to yes. Integrator will convert the column names to SQL-compatible column names by removing all characters that aren't alphanumeric or spaces.

Column names to SQL

CLOB fields converted to binary string

Problem: My flow extracts data from the database and creates CSV files. I've noticed that columns with the CLOB (TEXT, LONGTEXT, etc.) data types are converted to the binary string which looks like the following: H4sIAAAAAAAAAAXByxGCMBAAUC6WEwL5YPBiKZmwLLIMbJhkHc. Expected: string value like in the source, for example https://domain.com/test.

Solution: when configuring CSV format, disable Encode CLOB fields using Base64. This option is enabled by default.

disable encoding field value using base64

Working with Excel files

Using an Excel worksheet index when reading Excel workbooks

Problem: can an Excel format point to the sheet number instead of the name of the sheet? For example, we process Excel sheets from different clients and they do not all name Sheets the same way: one client will name it Sheet1 and the other might name it StockOnHand. But it is always the first sheet in the Excel workbook.

Solution: when creating an XLS or XLSX format enter the 1-based index of the worksheet: 1 for the first worksheet, 2 for the second, etc. That is all.

Important: if there is a worksheet actually named 1, 2, etc. Integrator will load the worksheets named 1, 2, etc. instead of the worksheets with the index 1, 2, etc.

Using an Excel worksheet name as a field value

Problem: how can I access the actual name of the source worksheet and use it in the destination? For example, I'm extracting data from an Excel file and loading it into a database. I'm accessing the worksheet by using a 1-based index (not by name) and I would like to use the source worksheet name as a value for one of the fields in my destination table.

Solution: you will need to use JavaScript and per-field mapping, to populate a destination field containing the source worksheet name.

Step 1. Create a mapping for when the source field doesn't exist and the destination field is a calculated field.

File name

Step 2. Click the Edit field value function (pen) icon and enter the following line:

destination.getSourceToUse().getDataSet().getDataObjectToRead();

Reading an Excel worksheet with a header not in the first line

Problem: I have an Excel file and the first, say 5 or 6 columns are "summary" data, while the actual header starts at column 7. How do I skip those and start at column 7, with that column being the header?

Solution: when creating an XLS or XSLX format, enter the number of the row where the header starts in the Start row field. Optionally enter the number of the last row to read in the End row field.

Excel start row

Working with XML files

Working with large XML files

In Etlworks Integrator it is possible to work with very large files. Read how to split files and, specifically, split XML files.

Merging XML files

Read how to merge XML files in Integrator.

Using XSL stylesheets to transform XML

Let's assume that you are receiving a source XML from a web service and loading it into a database. Let's also assume that your source XML is a nested data object, like the one below:

<?xml version="1.0" encoding="UTF-8"?>
<GoodsReceivedIntoBond>
     <GoodsReceivedIntoBondHeader WarehouseNo="CTN 12" CustomerCode="Cust1" 
     DocumentNo="00123" DocumentDate="2017-02-08" NumberOfLine="3" DocumentXRef="12345">
          <GoodsReceivedIntoBondLines>
               <StockCode>B3B4</StockCode>
               <StockQuantity>60.0000</StockQuantity>
          </GoodsReceivedIntoBondLines>
          <GoodsReceivedIntoBondLines>
               <StockCode>B5GT</StockCode>
               <StockQuantity>60.0000</StockQuantity>
          </GoodsReceivedIntoBondLines>
          <GoodsReceivedIntoBondLines>
               <StockCode>B121W</StockCode>
               <StockQuantity>360.0000</StockQuantity>
          </GoodsReceivedIntoBondLines>
     </GoodsReceivedIntoBondHeader>
     <GoodsReceivedIntoBondHeader WarehouseNo="CTN 13" CustomerCode="Cust2" 
     DocumentNo="00456" DocumentDate="2017-02-08" NumberOfLine="2" DocumentXRef="6789">
          <GoodsReceivedIntoBondLines>
               <StockCode>B7C7</StockCode>
               <StockQuantity>16.0000</StockQuantity>
          </GoodsReceivedIntoBondLines>
          <GoodsReceivedIntoBondLines>
               <StockCode>B12345</StockCode>
               <StockQuantity>16.0000</StockQuantity>
          </GoodsReceivedIntoBondLines>
     </GoodsReceivedIntoBondHeader>
</GoodsReceivedIntoBond>

Most databases store data in flat "tables" with rows and columns. So, ideally, you would like to denormalize a nested data object, like the one above, and transform it into a flat "table":

Denormalized nested datasets

In Integrator, all you need to do to accomplish this is to use an XML format together with an XSL stylesheet.

Step 1. Create an XSL stylesheet which transforms a nested-source XML, like that above, into a denormalized "flat" XML, like that below:

<?xml version="1.0" encoding="UTF-8"?>
<result>
    <ROW>
       <WarehouseNo>WN</WarehouseNo> 
       <CustomerCode>CC</CustomerCode> 
       <DocumentNo>DN</DocumentNo>
       <DocumentDate>DD</DocumentDate> 
       <NumberOfLine>NOL</NumberOfLine> 
       <DocumentXRef>XREF</DocumentXRef>
       <StockCode>SC</StockCode>
       <StockQuantity>SQ</StockQuantity>
    </ROW>
    <ROW>
       .....
    </ROW> 
</result>

For this solution, we will be using the following XSL stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
    <xsl:output method="xml" encoding="UTF-8" indent="yes"/>
    <xsl:template match="/">
        <xsl:variable name="var1_initial" select="."/>
        <result>
            <xsl:for-each select="GoodsReceivedIntoBond/GoodsReceivedIntoBondHeader">
                <xsl:variable name="var2_cur" select="."/>
                <xsl:for-each select="GoodsReceivedIntoBondLines">
                    <xsl:variable name="var3_cur" select="."/>
                    <ROW>
                        <WarehouseNo>
                            <xsl:value-of select="$var2_cur/@WarehouseNo"/>
                        </WarehouseNo>
                        <CustomerCode>
                            <xsl:value-of select="$var2_cur/@CustomerCode"/>
                        </CustomerCode>
                        <DocumentNo>
                            <xsl:value-of select="number($var2_cur/@DocumentNo)"/>
                        </DocumentNo>
                        <DocumentDate>
                            <xsl:value-of select="$var2_cur/@DocumentDate"/>
                        </DocumentDate>
                        <NumberOfLine>
                            <xsl:value-of select="number($var2_cur/@NumberOfLine)"/>
                        </NumberOfLine>
                        <DocumentXRef>
                            <xsl:value-of select="number($var2_cur/@DocumentXRef)"/>
                        </DocumentXRef>
                        <StockCode>
                            <xsl:value-of select="StockCode"/>
                        </StockCode>
                        <StockQuantity>
                            <xsl:value-of select="number(StockQuantity)"/>
                        </StockQuantity>
                    </ROW>
                </xsl:for-each>
            </xsl:for-each>
        </result>
    </xsl:template>
</xsl:stylesheet>

Important: note the <ROW>...</ROW> tag in the stylesheet.

Step 2. Create an XML format in Integrator which uses the XSL stylesheet above.

Note:

XML Formatting with an XSL style sheet

Step 3. Use the format created in Step 2 as your source (FROM) format.

Creating complex XML docs using an XSL transformation

Let's assume that you need to create an XML document like the one below:

<Transaction>
   <Invoice>AR</Invoice>
   <InvoiceNo>654058</InvoiceNo>
   <InvoiceType>RI</InvoiceType>
   <AccountNo>1601001223</AccountNo>
   <InvoiceDate>20160429</InvoiceDate>
   <BranchId>64</BranchId>
   <Department>15</Department>
   <FileRef>F21557</FileRef>
   <InvoiceTotal>13469.02</InvoiceTotal>
   <InvoiceLine>
      <BillingCode>G37T</BillingCode>
      <TransactionDesc>NEX6636-RFC</TransactionDesc>
      <Amount>642.84</Amount>
      <Vat>90</Vat>
   </InvoiceLine>
   <InvoiceLine>
      <BillingCode>G37T</BillingCode>
      <TransactionDesc>NEX308707-RFC</TransactionDesc>
      <Amount>60.97</Amount>
      <Vat>8.53</Vat>
   </InvoiceLine>
   <InvoiceLine>
      <BillingCode>G37T</BillingCode>
      <TransactionDesc>NEX307851-RFC</TransactionDesc>
      <Amount>233.52</Amount>
      <Vat>32.69</Vat>
   </InvoiceLine>
</Transaction>

Let's also assume that your source is a flat file, or data in one or comes from multiple database tables. To simplify this example, we will not be explaining how to select data from the database or read a flat file. Our source datasets will look like the one below:

Flat datasets

To simplify, we will be using the same columns in the source as we will need in the destination:

Columns

Of course, if the columns are different, you can use any of the mapping techniques available in Integrator.

Step 1. Create a source connection.

Step 2. Create a source format if needed. This step is optional if we are going to read data from a database.

Step 3. Create a destination connection.

Step 4. Create an XML format for the destination. When creating a format, a) uncheck XML Document, b) set Start Element Name to Transaction and c) set Row Tag to InvoiceLine:

XML with XSLT

Step 5. Create a flow where the source (FROM) is a connection, and (optionally) create a format, using the source created in Steps 1-2, and choices for the destination (TO) to match the connection and format created in Steps 3-4.

Step 6. Execute the flow and check the output. It should look like this (below):

<Transaction>
     <InvoiceLine>
          <Invoice>AR</Invoice>
          <InvoiceNo>654058</InvoiceNo>
          <InvoiceType>RI</InvoiceType>
          <AccountNo>1601001223</AccountNo>
          <InvoiceDate>20160429</InvoiceDate>
          <BranchId>64</BranchId>
          <Department>15</Department>
          <FileRef>F21557</FileRef>
          <InvoiceTotal>13469.02</InvoiceTotal>
          <BillingCode>G37T</BillingCode>
          <TransactionDesc>NEX6636-RFC</TransactionDesc>
          <Amount>642.84</Amount>
          <Vat>90.0</Vat>
     </InvoiceLine>
     <InvoiceLine>
          <Invoice>AR</Invoice>
          <InvoiceNo>654058</InvoiceNo>
          <InvoiceType>RI</InvoiceType>
          <AccountNo>1601001223</AccountNo>
          <InvoiceDate>20160429</InvoiceDate>
          <BranchId>64</BranchId>
          <Department>15</Department>
          <FileRef>F21557</FileRef>
          <InvoiceTotal>13469.02</InvoiceTotal>
          <BillingCode>G37T</BillingCode>
          <TransactionDesc>NEX308707-RFC</TransactionDesc>
          <Amount>60.97</Amount>
          <Vat>8.53</Vat>
     </InvoiceLine>
     <InvoiceLine>
          <Invoice>AR</Invoice>
          <InvoiceNo>654058</InvoiceNo>
          <InvoiceType>RI</InvoiceType>
          <AccountNo>1601001223</AccountNo>
          <InvoiceDate>20160429</InvoiceDate>
          <BranchId>64</BranchId>
          <Department>15</Department>
          <FileRef>F21557</FileRef>
          <InvoiceTotal>13469.02</InvoiceTotal>
          <BillingCode>G37T</BillingCode>
          <TransactionDesc>NEX307851-RFC</TransactionDesc>
          <Amount>233.52</Amount>
          <Vat>32.69</Vat>
     </InvoiceLine>
</Transaction>

As you can see - we are almost there, but not quite. We still need to move the Invoice... fields outside of the <InvoiceLine> block. We will be using an XLT transformation to reformat the XML into the desired format.

Step 7. Modify the XML format, created in Step 4 by adding the following XSL style sheet.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns:xs="http://www.w3.org/2001/XMLSchema" 
exclude-result-prefixes="xs">
  <xsl:output method="xml" encoding="UTF-8" indent="yes" omit-xml-declaration="yes"/>
  <xsl:template match="/">
    <xsl:variable name="var1_initial" select="."/>
    <Transaction>
      <Invoice><xsl:value-of select="Transaction/InvoiceLine/Invoice"/></Invoice>
      <InvoiceNo><xsl:value-of select="Transaction/InvoiceLine/InvoiceNo"/></InvoiceNo>
      <InvoiceType><xsl:value-of select="Transaction/InvoiceLine/InvoiceType"/></InvoiceType>
      <AccountNo><xsl:value-of select="Transaction/InvoiceLine/AccountNo"/></AccountNo>
      <InvoiceDate><xsl:value-of select="Transaction/InvoiceLine/InvoiceDate"/></InvoiceDate>
      <BranchId><xsl:value-of select="Transaction/InvoiceLine/BranchId"/></BranchId>
      <Department><xsl:value-of select="Transaction/InvoiceLine/Department"/></Department>
      <FileRef><xsl:value-of select="Transaction/InvoiceLine/FileRef"/></FileRef>
      <InvoiceTotal><xsl:value-of select="Transaction/InvoiceLine/InvoiceTotal"/></InvoiceTotal>
      <xsl:for-each select="Transaction/InvoiceLine">
        <xsl:variable name="var2_cur" select="."/>
        <InvoiceLine>
          <BillingCode>
            <xsl:value-of select="BillingCode"/>
          </BillingCode>
          <TransactionDesc>
            <xsl:value-of select="TransactionDesc"/>
          </TransactionDesc>
          <Amount>
            <xsl:value-of select="number(Amount)"/>
          </Amount>
          <Vat>
            <xsl:value-of select="number(Vat)"/>
          </Vat>
        </InvoiceLine>
      </xsl:for-each>
    </Transaction>
  </xsl:template>
</xsl:stylesheet>

XSLT

Step 8. Execute the flow again. This time, the output will look like the following, exactly what we need:

<Transaction>
   <Invoice>AR</Invoice>
   <InvoiceNo>654058</InvoiceNo>
   <InvoiceType>RI</InvoiceType>
   <AccountNo>1601001223</AccountNo>
   <InvoiceDate>20160429</InvoiceDate>
   <BranchId>64</BranchId>
   <Department>15</Department>
   <FileRef>F21557</FileRef>
   <InvoiceTotal>13469.02</InvoiceTotal>
   <InvoiceLine>
      <BillingCode>G37T</BillingCode>
      <TransactionDesc>NEX6636-RFC</TransactionDesc>
      <Amount>642.84</Amount>
      <Vat>90</Vat>
   </InvoiceLine>
   <InvoiceLine>
      <BillingCode>G37T</BillingCode>
      <TransactionDesc>NEX308707-RFC</TransactionDesc>
      <Amount>60.97</Amount>
      <Vat>8.53</Vat>
   </InvoiceLine>
   <InvoiceLine>
      <BillingCode>G37T</BillingCode>
      <TransactionDesc>NEX307851-RFC</TransactionDesc>
      <Amount>233.52</Amount>
      <Vat>32.69</Vat>
   </InvoiceLine>
</Transaction>

Batch-transforming XML docs using XSLT

Problem: we have a lot of identical XML documents stored on a hard drive, which we need to transform to a different format.

Solution: you can use an XSLT flow for batch-transforming XML docs.

Step 1. Create connections for the source and destination documents. It can be the same connection. Only Server Storage connections are allowed for this flow type.

Step 2. Start creating the flow by typing xsl into the Select Flow Type dialog box:

XSLT flow

Step 3. Continue by adding one or more transformations, selecting the source and destination connections created in Step 1, entering the filename or a wildcard filename for the source XMLs into the FROM field, and entering the (optional) destination filename into the TO field. If the TO is empty - each destination file(s) will be created with the same name as the source file(s). The wildcard filenames are not supported for TO.

Important: this flow takes the files matching the FROM field from the source connection, and creates files in the destination connection. If the source connection and the destination connection are pointed to the same folder, and the TO field is empty - the original source file(s) will be overwritten. It is highly recommended that you use a different folder for the destination connection. You can use a Create folder(s) flow type to create folders at runtime.

batch XSLT flow

Parsing XML attributes.

By default, if the source XML contains attributes, such as <node attribute="value">, the attributes will not be automatically parsed. Enable parsing attributes by creating a new XML format and selecting the flag Parse XML Attributes.

Not parsing XML attributes in the root node.

If the flag Parse XML Attributes for the XML format is enabled, Etlworks Integrator will parse the attributes in all nodes, including the root node. Frequently, the attributes in the root node are in fact XML schema(s), such as <data xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">. To disable parsing the root attributes, create a new XML format and disable the flag Parse XML Attributes in Root Node. Note: this flag has no effect on parsing if Parsing XML attributes is disabled.

Not parsing CDATA section

By default, Etlworks Integrator will parse the CDATA sections in XML:

<name>
   <![CDATA[this is a test]]>
</name>

Disable this flag if you are absolutely sure that there are no CDATA sections in the XML to parse. This will enable the XML to be parsed slightly faster.

Working with local files

Local files are the files in a storage device attached to your computer or a network device available from your computer.

In Integrator, you can copy (upload) local files into any of the available file storage or cloud storage locations, including server storage. In fact, the simplest way to start working with local files is to upload them into server storage.

Step 1. Create a server storage connection.

Step 2. Use Explorer to upload the local files.

Important: in order to upload local files, you must have access to Integrator's add-on, Explorer.

Working with Google Cloud Storage

Enabling the Interoperability API

The Interoperability API lets you use HMAC authentication and lets that Cloud Storage interoperate with tools written for other cloud storage systems, such as Amazon S3. Integrator requires enabling of the Interoperability API.

To enable it: go to the Google Cloud console->Storage->Settings->Interoperability->Enable, and create a new pair: Access Key and Secret.

configure access to google cloud storage

Working with Amazon S3

Working with region-specific buckets

The default endpoint for the Amazon S3 web service is s3.amazonaws.com. It will work in 99% of the cases. If a created bucket is limited to only a specific AWS region, you might need to enter that region's specific endpoint, such as s3-us-east-2.amazonaws.com.

s3 endpoint

Adding metadata when creating files in S3 (also works for Google Cloud Storage)

The metadata can be used to implement server-side encryption or simply user-defined fields.

To add metadata to the files created in S3, use HTTP headers when configuring your S3 connection.

S3 metadata

Note: all Amazon S3 headers have the prefix x-amz-, even if you don't set them.

Enabling a Requester Pays option

A bucket in S3 is normally configured so that the bucket's owner pays all the service fees for accessing, sharing and storing the objects. In contrast, S3's Requester Pays feature allows a bucket to be configured so that the individual who sends requests to a bucket is charged for the S3 request and data transfer fees, instead of charging the bucket's owner.

To enable the Requester Pays option, just select Requester Pays when configuring the S3 connection.

S3 requester pays

Archive files before copying them to a S3 bucket (this also works for Google Cloud Storage)

Integrator can archive files before copying them to an S3 bucket, using one of the supported algorithms (zip or gzip). Since cloud storage is typically a paid service, you can save yourself time and money by archiving files.

To enable this option, select Zip or GZip for Archive file before copying to S3.

S3, archived files

Working with data

Handling columns with a blank name

It is possible that a column in the source dataset doesn't have a name (has a blank name). For example:

empty column name

Integrator handles it by automatically generating a name for the "empty name" column. The new name is generated as column + column index. For example column2.

empty column

If it is a truly an "empty column" - no name and no data, you can easily exclude it in the mapping or by using SQL, like the statements below:

select "Item Code", "Item Description", Group, "Qty On Hand"

If a column has no name but does have data - just use a new auto-generated name, such as column2.

Reading data from one dataset and loading it into multiple DB tables

Let's assume there is a "flat" dataset, like the one below:

Denormalized nested datasets

As you can see, the values in the first 6 columns are repeated for multiple rows, grouped by the first 3 and the last 2 rows. The values in last 2 columns are unique. This is because the table above was created by transforming a nested dataset into a denormalized, "flat" dataset. See theexample above for more details.

Now, let's assume that we want to read the dataset above and load it into 2 database tables instead of just one:

Normalized tables

Basically DocumentXRef is the primary key in the Order table and a foreign key in the Item table. The relationship between Order and Item is one-to-many (one order matches many items).

If our example has Item.ItemId as an auto-generated primary key in the Item table, then we don't have to worry about generating a value for Item.ItemId - the database will take care of it automatically.

The technique we are going to use in this example is fairly simple: read the same source dataset (file) twice and use it to insert data into two tables. We will be creating two transformations with the same source and different destinations (two tables). The advantage of this technique is that it does not require any programming.

Step 1. Create a source connection and format which will be used to read data from the source dataset.

Step 2. Create a destination which is a database connection.

Step 3. Start creating a flow in the Flow Builder window by clicking the + button and selecting Databases->Extract data from files, load into database (or something similar, as long as the source is a data object and the destination is a database).

Step 4. Let's create the first transformation, which inserts data into the Order table. Start by selecting a source (FROM) connection and format, and destination (TO) connection. Continue by entering a filename into the FROM field, for example test.xml and entering order into the TO field.

One source to multiple tables

Step 5. Click the MAPPING button and create a mapping which only includes columns from the table Order and excludes columns from the table Item:

Order mapping

Step 6. Select the Parameters tab, scroll down to the Remove Duplicates field and enter DocumentXRef:

Remove duplicates

Note: by entering DocumentXRef in the Remove Duplicates field you activate a transformation which excludes all records with the same value in the DocumentXRef, except for the first occurrence.

Step 7. Uncheck Parallel and save the transformation.

Step 8. Continue by adding a second transformation, which inserts data into the Item table. Use the same FROM and TO connection, format and filename as in Step 4. Enter item into the TO field.

Order and Item transformations

Step 9. Click the MAPPING button and only include columns from the table Item while excluding columns from the table Order. Only the column DocumentXRef exists in both:

Item mapping

Step 10. Select the Parameters tab, uncheck Parallel and save the transformation.

Step 11. Save the flow.

Executing SQL queries on any dataset

Problem: how can I calculate a summary or average value of a certain column in a source flat file?

Solution: in Integrator you can actually do much more. You can execute a complex SQL query on source datasets of any type, including flat files, nested data objects returned by web services, email attachments and values from key-value storage.

When creating a transformation, enter a query into the Source query field on the Mapping tab.

Source SQL

Important: Data streaming will be automatically disabled if you use SQL queries together with data objects (as opposed to relational databases). Therefore, Integrator must read the whole dataset into memory before it can execute the query.

Writing SQL queries

Writing SQL queries which will be executed on a dataset of any type is not much different from writing queries which will be executed on a relational database:

Using aggregation functions

Suppose that the source data object is an Excel file, containing order items:

Order items

The goal is to read a source Excel file and load it into two database tables:

To split a single source dataset into two, we will be using the technique described here.

To create the first transformation, which populates the Order table, simply write a Source SQL query which includes an aggregation function, as in the example below:

select "Order Number" OrderNum, "Order Date" as OrderDate, sum(amount) OrderAmount
group by "Order Number"

Note: we are using the group by expression together with SUM(amount).

The following aggregation functions are supported:

The group by expression can include one or more comma-separated field name(s) to group the dataset.

Note: Expressions are not supported as parameters for aggregation functions.

Writing queries for nested data objects.

This is an example where the source is a nested data object returned by a web service.

{  
   "requestId":"185d6#14b51985ff0",
   "success":true,
   "result":[  
      {  
         "id":"433455454",
         "name":"all fields", 
         "searchableFields":[  
            {  
               "name":"field1",
               "displayName":"Just field 1",
               "dataType":"string",
               "length":36,
               "updateable":false
            },
            {  
               "name":"field2",
               "displayName":"Just field 2",
               "dataType":"string",
               "length":50,
               "updateable":false
            },
         ],
         "fields":[  
            {  
               "name":"leadId",
               "displayName":"Lead Id",
               "dataType":"integer",
               "updateable":false
            },
            {  
               "name":"role",
               "displayName":"Role",
               "dataType":"string",
               "length":50,
               "updateable":false
            },
            {  
               "name":"isPrimary",
               "displayName":"Is Primary",
               "dataType":"boolean",
               "updateable":true
            },
            {  
               "name":"externalCreatedDate",
               "displayName":"External Created Date",
               "dataType":"datetime",
               "updateable":true
            }
         ]
      }
   ]
}

The goal is to extract all searchableFields and fields from the nested data object and create a flat CSV file.

This can easily be accomplished by using the following SQL query, which is applied to the nested source data object:

select name, displayName, 'searchable' as fieldType
from result.searchableFields
union 
select name, displayName, 'regular' as fieldType
from result.fields

The additional from clause is of primary importance. When writing a query to be executed on a nested data object, you can specify a level from which the data will be extracted. Use . as a delimiter to drill down into the nested hierarchy of objects.

Additionally, when querying data from a nested data object, you can use union and intersect, if the list of fields is the same for all queries that are combined by using union or intersect.

If you want to include some of the fields from the parent object, as well as fields from the nested object, use the owner. qualifier:

select result.name as apiName, searchableFields.name, searchableFields.displayName, 'searchable' as fieldType
from result.searchableFields
union  
select result.name as apiName, fields.name, fields.displayName, 'regular' as fieldType
from result.fields

Using staging tables to load data into a database

Problem: I want to use custom SQL to load data from files into a database. Is it possible to load data into some sort of staging table, and then execute SQL to update the actual tables?

Solution: In Integrator, you can load the contents of a file into a staging table and then execute custom SQL statements.

  1. Create a flow which loads data from the file into a staging table. Use the flow type Extract data from files, load into database. Keep in mind that Integrator can create a staging table for you, if it does not already exist.
  2. Create an SQL flow which drops or clears the staging table.
  3. Create one or more SQL flows which update the actual tables, using data from the staging table.
  4. Create a nested flow that combines the flows created in Steps 2,1 and 3 (in that order).

Working with complex nested data objects

Problem: I'm reading data from a web service and creating a file in SFTP. The created file seems to be missing a lot of attributes, which were present in the source. Is it possible to preserve all the attributes from the source?

Solution: by default, Integrator will try to use streaming, which is another way of saying "please keep as little data in memory as possible". Generally speaking, streaming is faster and has a very low memory footprint. Unfortunately, streaming is not always possible, for instance when dealing with complex nested data objects, which are typically returned by web services. In such a case, it is always a good idea to disable streaming.

  1. Create a transformation.
  2. Select the Parameters tab.
  3. Uncheck Stream Data.

Do not stream

Accessing raw data from the source (before a transformation)

Problem: I'm reading data from a web service (or file), for example as a JSON document. Is there any way to access the original, raw data from the source before it gets "transformed" by Integrator?

Solution: use the Preprocessor feature, which is available for the text formats (JSON, CVS, XML, etc.).

store the original message

Step 1. Create a text format, such as CSV, XML, JSON.

Step 2. While creating the format select Preprocessor for the Transformation Type field.

Step 3. Enter the following code into the Preprocessor field. Use a unique key for SystemConfig.instance().getProperties().put. Store the key.

importPackage(com.toolsverse.config);

SystemConfig.instance().getProperties().put("rawmessage", message);

value = message;

Step 4. Now you can access the original raw message from anywhere, using the following JavaScript:

importPackage(com.toolsverse.config);

var message = SystemConfig.instance().getProperties().get("rawmessage");

Working with databases

Inserting auto-generated value

If a column is one of the auto-generated data types, for example, identity, simply exclude it from the mapping.

MERGE (UPSERT)

Problem: I'm migrating data from one database to another. I only want to INSERT new records, but UPDATE the existing ones.

Solution: use the MERGE action when configuring the source-to-destination transformation.

Important: Integrator supports MERGE for Oracle, MS SQL Server, PostgreSQL, MySQL, DB2, Informix and Sybase databases. If your database is not on the list or if you don't want to use MERGE, read how to configure the IfExist action.

Step 1. Create a flow with at least one transformation where the database is a destination.

Step 2. When configuring a transformation, click MAPPING and then select the Parameters tab.

Step 3. Select MERGE as the action and enter a comma-separated list of field names which will be used to identify a unique record.

Merge action

UPDATE if record exists, otherwise INSERT

The alternative to MERGE is the IfExist action. Instead of relying on the native MERGE, which is included in the target database engine, Integrator will execute an SQL statement to check to see if the record already exists, and if it does, will then execute an UPDATE. If the record does not exist, Integrator will execute an INSERT.

Use IfExist if:

Step 1. Create a flow which includes at least one transformation where a database is the destination.

Step 2. When configuring that transformation, click MAPPING and then select the Parameters tab.

Step 3. Select IfExist as the action and enter a comma-separated list of field names which will be used to identify a unique record.

the IfExist action

Using a temporary (staging) database

If you want to process multiple files from one (input) folder and create a single output file in a different format, it would be nice if Integrator could support the following scenario:

1 - Load all the input files into a staging database.

2 - Use SQL to query the staging database, creating an output file.

3 - Clean up the staging database.

Yes, in Integrator, it is quite easy to create the scenario above, thanks to its built-in ability to process multiple files in the staging folder and support for temporary databases.

Step 1 Create a connection to a temporary database.

Step 2 Create a flow which reads multiple source files and inserts data into the staging database, using the connection created in Step 1.

Step 3 Create a flow, which reads data from the staging database and creates an output file.

Step 4 Create a nested flow, which combines the flows created in Steps 2 and 3.

Staging

Read records from a database and execute the flow for each record

Use a Nested flow with the Loop type set to SQL, in order to execute the flow for each record in the result set.

Step 1. Create a database connection which will be used to execute the driving (loop of) SQL.

Step 2. Create any flow, for example, a flow which reads data from a database and creates files, or a flow which executes a web service. We're assuming that you want to create a loop that executes this flow for each record in the result set.

Step 3. Create a nested flow and add the flow created in Step 2.

Step 4. Click the Pen icon, next to the flow name. The Parameters pop-up window will open.

SQL loop

Step 5. Select SQL for Loop type.

Step 6. Select the connection created in Step 1 for the Connection.

Step 7. Enter the driving SQL, for example select roleid from role as the Loop script .

Step 8. Modify the flow created in Step 2 to use fields from the result set. Enclose the field name in curly brackets: {field name}. For example: {roleid}.

You can use the {field name} in:

- the FROM and TO fields for a transformation

from and to with parameters

- a Connection URL

- Anywhere in the source or destination query

SQL with parameters

Important: when referencing {parameters}, make sure you are using the same CaSe for the field name as returned by the database. For example, some databases enforce UPPERCASE for column names returned by executing the SQL query: select roleid from role --> ROLEID, while other databases keep the original: select roleid from role --> roleid.

Debugging a database loop

Problem: when executing flows in a loop, using the technique explained above, it's possible that the fields in the SQL result set, which drive the loop, will be null or empty. If that happens, Etlworks Integrator will not be able to replace the {tokens} with actual values. As a result, the flow in the loop will be executed with an error.

For example:

If the flow to be executed in the loop uses a source connection with a tokenized URL: http://host.com?startDate={startDate}&endDate={endDate}.

The driving SQL is: select startDate, endDate from config.

But for some records the value of the startDate field is null, while for others it has an actual non-empty value, for example yesterday.

As a result, when Etlworks Integrator tries to replace the token {startDate} in the URL, in order to reach the source connection, it will not be able to find a non-empty value, so the URL will remain unchanged: http://host.com?startDate={startDate}. Most likely, this will cause the flow to fail.

Solution: when Etlworks Integrator detects that some of the values in the variable-value pairs, which are used to substitute {variable} tokens, are null or empty, it adds the following line to the flow log:

WARNING: scenario 'scenario name' will be executed with an empty value for the variable {variable name}.

View flow log from the Flow Builder

View flow log from the audit-trail grid

Inserting images from files into a database table

Let's assume that our target database supports inserting images from files. The example below was created for MS SQL Server.

Let's also assume that there is a folder which either already contains images, or you copied/moved images to that folder using the Copy or Move Files ETL flow.

Important: the folder must be accessible by the target database.

Lastly, let's assume that there is an image table like the one below (as an example):

CREATE TABLE image (
    Id INT IDENTITY(1,1) PRIMARY KEY,
    FileName VARCHAR(255) NOT NULL,
    Image VARBINARY(max) NOT NULL
);

Step 1. Create a database connection to the target database.

Step 2. Create a Server Storage connection to the folder with images. Let's call this connection Local images.

Step 3. Create a JavaScript flow. This is an example of some JavaScript code that you might want to use. Note the URL of the connection in the FileManagerTask.list method. It must be the same as the connection created in Step 2.

importPackage(com.toolsverse.etl.core.task.common);
importPackage(com.toolsverse.etl.common);

var Alias alias = new Alias();
alias.setUrl('the same URL as in Step2');

etlConfig.setValue('image files to load', FileManagerTask.list(alias, *.*'));

Step 4. Create an Execute SQL flow. Use the connection which was created in Step 1. This is an example of SQL statements that you might want to use. Note: the token {FILE_TO_LOAD} is used.

INSERT INTO Image(FileName, Image) 
VALUES ('{FILE_TO_LOAD}', (SELECT * FROM OPENROWSET(BULK N'{FILE_TO_LOAD}', SINGLE_BLOB) AS Image))

Important: in this example, we are using MS SQL Server syntax to insert an image from the file. Check your target database documentation for more examples.

Step 5. Create a Delete Files flow. Use the connection created in Step 2 as a source. Enter {FILE_TO_DELETE} into the FROM field.

Step 6. Create a Nested Flow, which combines the flows created in Steps 4 and 5 (in that order).

Step 7. Create a final Nested Flow.

7.1 Add the flow created in Step 3.

7.2 Add the flow created in Step 6. Click the Edit condition button (pen icon).

7.3 Add the following code as a Loop script. Note: these tokens are used: {FILE_TO_LOAD} and {FILE_TO_DELETE}.

importPackage(com.toolsverse.config);
importPackage(com.toolsverse.etl.common);

var files = etlConfig.getValue('image files to load');

if (!files.isEmpty()) {
  var variable = scenario.getExecute().get(0).getVariable('FILE_TO_LOAD');

  if (variable == null) {
      variable = new Variable();
      variable.setName('FILE_TO_LOAD');
      scenario.getExecute().get(0).addVariable(variable);
  }  

  variable.setValue(files.get(0).getPath());
  SystemConfig.instance().getProperties().put('FILE_TO_DELETE', files.get(0).getName());

  files.remove(0);
}  

value = files.isEmpty() ? null : 'continue';

Step 8. Done. You can now execute the flow created in Step 7. It will insert images in local files into a database table and delete files one-by-one, until there are no files left.

Working with web services and APIs

Setting HTTP headers

When configuring an HTTP connection, you can (optionally) add HTTP headers as key-value pairs.

Note: key is required but value can be empty.

HTTP headers

Making asynchronous HTTP calls.

Enable asynchronous HTTP calls for the POST, PUT, and DELETE endpoints by setting the value of the property Maximum Asynchronous HTTP Calls to greater than zero (0 is the default).

Asynchronous HTTP

Dynamic URLs

Problem: I'm pulling data from a Web Service with date parameters that are contained in the line: http://url?periodstart=YYYYMMDD&periodend=YYYYMMDD. Is there a way to construct the URL string to dynamically set dates?

Solution: In Integrator, you can use JavaScript and a tokenized URL to dynamically set the URL parameters.

Note: you can use this technique anywhere there is a URL, Path, Directory, Bucket, Key or File(s).

Step 1. Create a connection to the Web Service using a tokenized URL:

URL example: https://my.api.com/metrics/?apiKey=111-222-333&periodStart={period_start}&periodEnd={period_end}

Important: notice the tokens {period_start} and {period_end}. Make sure that the token names are unique for each endpoint.

Dynamic URL

Step 2. Create a flow that pulls data from the web service, where the source (FROM) connection is the connection created in Step 1.

Step 3. Create a JavaScript flow, which will be used to set tokens.

Below is an example of the JavaScript code:

importPackage(com.toolsverse.config);
importPackage(java.util);

// getting properties
var props = SystemConfig.instance().getProperties();

// getting instance of the Calendar
var cal = Calendar.getInstance();

// calculating today
var now = new java.util.Date(cal.getTimeInMillis());

// calculating 50 days ago
cal.add(Calendar.DATE, -50);
var then = new java.util.Date(cal.getTimeInMillis());

// setting tokens
props.put("period_start", Utils.date2Str(then, "yyyy-MM-dd"));
props.put("period_end", Utils.date2Str(now, "yyyy-MM-dd"));

Set the URL tokens

Step 4. Create a nested flow which includes the flows created in Step 2 and Step 3.

Setting tokens and executing the flow

Step 5. Manually execute or schedule the flow created in Step 4.

Debugging an HTTP request

Read how to identify and fix issues that arise when connecting to a Web service.

Authentication when connecting to a web service

Problem: I'm trying to connect to a web service using an HTTP connector and I cannot figure out how to configure the authentication parameters.

Solution: Integrator natively supports basic, token-based 2-step (OAuth2) and one-step (OAuth1) authentication. It is also possible to connect to a web service which is using a mixture of these three.

Basic Authentication (also known as username/password authentication).

  1. When configuring an HTTP connection, select basic as the value for the Authentication field.
  2. Enter the username and password into the User and Password fields.

Basic Authentication with additional parameters. In some cases, additional parameters (besides username and password) are required to connect to a web service.

  1. When configuring an HTTP connection, select none as the value for the Authentication field.
  2. Make sure that POST or PUT is selected as the value for the Method field.
  3. Enter additional parameters in the JSON format into the Payload field. Use the tokens {user} and {password} as substitutes for the username and password. For example: {"username":"{user}","password":"{password}","labName":"Demo"}.

API Key Authentication.

  1. When configuring an HTTP connection, select none as the value for the Authentication field.
  2. When configuring the URL, use the token {password} as a substitute for the actual API key. For example: https://api.com/v1/metrics/timeseries?apiKey={password}.
  3. Enter an API key into the Password field.

Two-step token-based authentication (OAuth2).

  1. Select token or oauth2 as the value for the Authentication field.
  2. Enter the API endpoint used to receive an access token in the Authentication URL field. For example: https://777-XWC-041.mktorest.com/identity/oauth/token?grant_type=client_credentials&client_id={user}&client_secret={password}
  3. Select a value for the field HTTP Method for Token Authentication. Typically, it is POST.
  4. Enter a value for the optional Refresh Token field. A refresh token can be used to obtain an access token.
  5. Enter a value for the Access Token Attribute field. When the authentication endpoint is called, it typically returns an access token as part of a response in the JSON format. Use this field to specify the name of the attribute in JSON which contains an access token. For example: access_token.
  6. Enter a value for the Access Token Prefix field. A prefix is used for the authentication token header parameter. The default is Bearer.
  7. Enter an authentication payload into the Authentication Request Payload field. The following tokens are accessible as part of the payload: {refresh_token}, {user}, {password}. For example: {"login":"{user}","password":"{password}"}.
  8. Select a value for the Authentication Request Content Type field. The default is application/json.
  9. Enter client id into the User field.
  10. Enter client credentials into the Password field.

One-step oauth authentication (OAuth1).

  1. Select oauth1 as the value for the Authentication field.
  2. Enter consumer key into the User field.
  3. Enter consumer secret into the Password field.
  4. Enter access token into the Access Token field.
  5. Enter access secret into the Access Secret field.

Authentication when the authorization header parameter is user-defined (Header).

  1. Select header as the value for the Authentication field.
  2. Change the value for the optional Authentication Header Name field. The default value is Authorization.
  3. Enter the required authorization header parameter into the Content of the Authorization header field. You can use tokens such as {user}, {password} and others. Use single quotes: 'value' instead of double quotes: "value" (always use single quotes in place of double quotes).
  4. Enter an optional username into the User field.
  5. Enter an optional password into the Password field.

SOAP-style Authentication. When working with a SOAP web service, you are typically dealing with a payload that must be electronically signed, in some fashion. There is no single standard way of doing this, so we will merely explain a basic approach.

1. Let's assume that the payload (SOAP envelope) looks something like this (below):

<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" 
xmlns:ns1="http://www.vendor.com/soapendpoint/" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <SOAP-ENV:Header>
    <ns1:AuthenticationHeader>
      <vendorUserId>{requestUserId}</mktowsUserId>
      <vendorSignature>{requestSignature}</requestSignature>
      <vendorTimestamp>{requestTimestamp}</requestTimestamp>
    </ns1:AuthenticationHeader>
  </SOAP-ENV:Header>
  <SOAP-ENV:Body>
    <ns1:paramsGetMultipleCars>
      <leadSelector xsi:type="ns1:KeySelector">
        <keyType>MAKE</keyType>
        <keyValues>
          <stringItem>Ford</stringItem>
          <stringItem>Acura</stringItem>
        </keyValues>
      </leadSelector>
      <batchSize>100</batchSize>
    </ns1:paramsGetMultipleCars>
  </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

2. Notice the tokens {requestUserId},{requestSignature} and {requestTimestamp}. They are part of the electronic signature.

3. We will be using JavaScript to set the values of {requestUserId},{requestSignature} and {requestTimestamp}.

3.1. Create a flow which works with a SOAP web service.

3.2 Create a JavaScript flow, which will be used to set tokens.

Below is an example of the JavaScript code:

importPackage(java.text);
importPackage(java.util);

importPackage(javax.crypto);
importPackage(javax.crypto.spec);

importPackage(org.apache.commons.codec.binary);

importPackage(com.toolsverse.config);

var props = SystemConfig.instance().getProperties();

var vendorUserId = "user";
var vendorSecretKey = "password";

var df = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ssZ");
var text = df.format(new java.util.Date());
var requestTimestamp = text.substring(0, 22) + ":" + text.substring(22);           
var encryptString = requestTimestamp + vendorUserId;

var secretKey = new SecretKeySpec(new java.lang.String(vendorSecretKey).getBytes(), "HmacSHA1");
var mac = Mac.getInstance("HmacSHA1");
mac.init(secretKey);
var rawHmac = mac.doFinal(new java.lang.String(encryptString).getBytes());
var hexChars = Hex.encodeHex(rawHmac);
var signature = new java.lang.String(hexChars); 

props.put('requestSignature', signature);
props.put('requestTimestamp', requestTimestamp);
props.put('requestUserId', vendorUserId);

Notice props.put(...) at the end of the JavaScript code. This is where the actual values of the tokens are set: {requestUserId},{requestSignature} and {requestTimestamp}.

3.3. Create a nested flow, which includes the flows created in 3.1 and 3.2.

Configuring auto-retry when working with web services

Problem: I want to configure auto-retry so that when a call to the HTTP endpoint fails, the system can retry it multiple times. Is this possible?

Solution: use the Number of retries field when configuring the HTTP connection.

Available parameters:

Auto retries

Copy response from the HTTP request to a file

Problem: I can create a flow which dumps a response from the HTTP call into a file. However, when Etlworks Integrator parses the response, it creates a normalized dataset, which may be slightly different from the original response. Is there a way to create a file which looks exactly as it did when it was returned by the web service?

Solution: use a Copy Files flow with the source connection set to a web service connection and the destination connection set to any file storage connection or Redis. Specify the filename in the FROM field.

Copy the response to file

Alternatively, you can specify the filename in the TO field. Read how Etlworks Integrator will name files when the TO filename is provided.

Using the contents of a file as a payload for an HTTP endpoint

Step 1 Create a Copy files flow.

Step 2 Select a source connection and enter the source filename into the FROM field. Wildcard filenames are supported.

Step 3 Select a destination HTTP connection.

Important: the destination HTTP connection must be linked to a POST or PUT endpoint.

Testing and exploring API endpoints

Read how to test and explore API endpoints.

Template for the Request

Problem: I'm creating a flow where the destination is an HTTP endpoint. I'd like to use a visual designer to map source columns to the request (payload) columns. Is there any way to create a template which will be used to populate the request columns?

Solution: when configuring the file format which is going to be used for the request, use the Template field to specify columns and data types. For example:

{
  "HostName": "string",
  "DomainName": "string",
  "FullyQualifiedDomainName": "string",
  "Description": "string"
}  

Request template

Important: for the template, you must use the same file format as in the designated file: JSON for JSON files, CSV for CSV files, etc. Integrator is actually parsing the template using the parameters configured for that format.

Creating a payload using a template with tokens

You can use a template with tokens to create a payload for a web service.

Step 1. When creating a connection to the web service, enter the template to use for the payload into the Payload field. Use {token} for tokens.

payload

For example:

{
  "Device": "{name}",   
  "HostName": "{host}",
  "DomainName": "{domain}",
  "FullyQualifiedDomainName": "{domain_full}",
  "Description": "{description}"
}  

Step 2. Set tokens using JavaScript or by executing the flow in a loop.

Testing a payload

In many cases it makes sense to test a payload before sending it to a web service.

Step 1. To test the payload, simply replace the destination connection with a connection to file storage or cloud storage, while keeping the same destination format.

Step 2. Execute the flow manually and check the output.

Step 3. Once you are satisfied with the output, swap the destination connection with the connection to the web service.

Important: this technique will not work for a payload created using a template with tokens.

Fields in the JSON or XML response contain special characters

Problem: it is quite often when the fields in the JSON or XML response contain special characters, which are not compatible with SQL. As as result, these fields cannot be used in SQL, limiting an ability to filter out records in the response by using SQL query. Example:

[
   {"m:first": "Joe", "m:last": "Doe"},
   ...
]

Solution: when configuring CSV, JSON, XML, or a Fixed text format, set Column names compatible with SQL to yes. Integrator will convert the column names to SQL-compatible column names by removing all characters that aren't alphanumeric or spaces.

Flows, mapping and field values

Adding a description to the flow

In Integrator, it is possible to add a free-text description to the user-defined flows that are displayed next to the flow name in Flow Builder.

Note: it is also possible to add a description to connections, formats, listeners and schedules.

To add a description, use the description field in the flow editor.

flow description

If a flow has a description, the small i (info) icon is displayed next to the flow name in the Flow Builder.

info icon

To see the description, hover your mouse cursor over the i (info) icon.

display description

Ignoring Exceptions

In some cases it makes sense to ignore all or specific exceptions when performing a transformation. For example, if you are running a flow that reads the files in a loop and some of the files do not exist anymore, in a typical scenario the flow would throw an exception. In order to avoid such an error, you can configure the transformation to ignore the file not found exception.

To configure the transformation to ignore an exception, click the mapping button for that transformation, select the On Exception tab, and select Ignore for the On Exception field. Optionally, enter part of the exception string, which will then hopefully match with an actual exception, for example: file not found.

on exception

Executing flows in case of any error

Typically, pipelined flows within the nested flow will be executed in order and unconditionally. You can also configure a flow to be executed when something happens, for example, when an error occurs.

The most common use cases for these flows would be to:

To configure a flow to be executed when there's an error, click the MAPPING button and select the Flow Control tab, then select Execute if Error.

flow control

Important: when a flow is configured to be executed "on error", it will not be executed if there is no error (happy path), so be careful when selecting this option.

Duplicating the flow, all its connections and its formats

Problem: I have a flow, which I'd like to duplicate, make a few changes and schedule. Is this possible?

Solution: in Integrator you can copy a whole flow with all its connections and formats, using Export/Import.

Important: if you are exporting a nested flow - all included flows, connections and formats will be exported as well.

Export and Import

Tip: when you import a flow, Integrator creates new flows, connections and formats with the same parameters as the original flows, connections and formats but adds the suffix [timestamp]. For example: Create new FW xml file [05-17-2017_14-52-41]. It makes sense to change all generated names to something else, for example by removing the [timestamp] suffix and modifying the name. Make sure the new name is unique, otherwise Integrator will not allow you to change it.

Important: if a flow uses the HTTP listener as a source (FROM) or destination (TO), the original endpoint will be updated as well (since there can't be 2 endpoints with the same path):

Copy listener

Make sure that you change the URL Pattern before scheduling the new flow. Once again - the URL Pattern must be globally unique across all listeners.

Combining different types of flows together

Problem: is it possible to combine different types of flows, such as "copy files" and "execute SQL" flows, together and execute them as a single flow?

Solution: in Integrator you can combine any number of flows together, using a nested flow. A nested flow can be scheduled or executed manually.

Executing flows conditionally

Step 1. Create nested flow.

Step 2. Add flow to be executed conditionally to the nested flow.

Step 3. Click the pen button next to the flow name.

Step 4. Enter JavaScript conditions into the Condition field.

In JavaScript, the Condition must set the value variable in the last line to true or false:

value = condition;

Executing flows in a loop

Step 1. Create nested flow.

Step 2. Add flow to be executed in a loop to the nested flow.

Step 3. Click the pen button next to the flow name.

Step 4. Select the Loop Type. Depending on the loop type the following options are available:

JavaScript. A loop will run as long as the Loop Condition JavaScript returns a not null value in the last line.

value = condition ? true : null;

SQL. A loop will run for all records returned by loop SQL.

Files by wildcard. A loop will run for all files with the names, matching a wildcard condition.

Executing flows in a loop in parallel

The typical use-case for executing flows in a loop in parallel is when the flow hits different endpoints, driven by the loop conditions, or performs file management operations. Doing this in parallel can be much faster, compare to sequential execution.

Step 1. Configure flow to be executed in a loop using any of the available methods.

Step 2. After configuring loop conditions enter the value > 1 in the Loop Thread field. The value of this field will be used as a threshold for the number of the parallel threads that can be created when executing inner flow in a loop. The allowed values are 1-99. 1 is a default, which means that the inner flow will not be executed in parallel.

Loop threads

Important: at this time it is recommended to enable parallel execution only for the following flows types:

Use the actual source filename as a destination field value

Problem: how can I get the actual source filename and use it in the destination? For example, I'm extracting data from an Excel file and loading it into a database. I would like to use the source Excel filename as a value in one of the fields in my destination table.

Solution: you will need to use JavaScript and per-field mapping to populate the destination field containing the source filename.

Step 1. Create a mapping where the source field doesn't exist and the destination field is a calculated field.

Filename

Step 2. Click the Edit field value function (pen) icon and enter the following line:

destination.getSourceToUse().getDataSet().getFileNameToRead();

Field value

Use a field value as a filename

Problem: we created a flow which extracts data from a database and creates files, is it possible to use one of the field values as a filename?

Solution: you will need to use JavaScript and the tokenized value of the TO field.

Let's assume that we are creating a file, using a flow similar to this one.

Step 1. Set up a flow where the destination is a file.

Step 2. When creating the transformation - click the Mapping button and select the tab Parameters.

Step 3. Make sure Steam Data is unchecked.

Step 4. Check Execute Transformation Rules after extract and enter the following code into the Transformation Rules field:

importPackage(com.toolsverse.config);

var props = SystemConfig.instance().getProperties();

// InvoiceNo is just an example. Use whatever field name it makes sense to use.
props.put("InvoiceNo", dataSet.getFieldValue(dataSet.getRecord(0), "InvoiceNo").toString());

Note: InvoiceNo in the code above is just an example. Use whatever field name it makes sense to use.

Step 4. Click Save and Close.

Step 5. Enter {InvoiceNo}.xml into the TO field:

A tokenized TO

Important: use the same variable (token) name as you used in the code example above.

Static column value

Problem: When mapping a database column, sometimes there's only a static value that is placed in one of the destination columns, instead of it being mapped to a column in the source file, for example a static importer code. How do I add static values to a column?

Solution: when creating a mapping for the destination column, just enter the static column value as a "value function".

Step 1. Create a mapping where the source field doesn't exist and the destination field is a calculated field.

Step 2. Click the Edit field value function (pen) icon and enter the field value:

'static value'

Important: if the column value is a string, you must enclose it in single or double quotes, like this: 'static value'.

Current date/time as a column value

Problem: how do I insert the current date and\or time into a column?

Solution: when creating a mapping for the destination column, just enter new java.util.Date() as a "value function".

Step 1. Create a mapping where the source field doesn't exist and the destination field is a calculated field.

Step 2. Click the Edit field value function (pen) icon and enter the field value:

new java.util.Date();

Step 3. Go back to Parameters and uncheck Use Bind Variables.

No bind variables

Step 4. Go to Mapping and click the Field data type button. SelectDATE or TIMESTAMP for the Data Type. Optionally, enter a Database specific type.

Field data type

Performance tuning

Common performance tips

If there are multiple source-destination transformations of the same type (for example data migration from one database to another) - combine them all into a single flow:

multiple transformations

and enable parallel processing:

parallel processing

Important: if you are loading data into multiple database tables in parallel - make sure that either there are no reference constraints or the reference constraints are disabled. Otherwise, you might get a "reference constraint violation" exception.

Performance tuning when working with files

1) It is usually faster to work with files stored in server storage, than remote files or cloud storage.

2) Use streaming if possible. Streaming is enabled by default, but it might not work if your source or destination files are complex nested data objects. In that case, it is recommended that you disable streaming.

Do not stream

3) If the source file is large, especially if it is an XML file, it makes sense to split it into multiple smaller files, and then process them one-by-one in a loop. Read more about splitting files.

4) If you can, zip files before copying/moving them to remote or cloud file storage.

Zip flow

Performance tuning when working with databases

1) Use streaming and bind variables if possible, but avoid using native SQL.

database performance

2) Force streaming when streaming is not supported

3) If you are loading large datasets into the database, enable Auto Commit for the destination database connection.

auto commit

4) If you are running custom SQL queries or using MERGE/UPDATE/DELETE/IfExist actions - make sure that all relevant db indexes exist.

merge action

Administration

Stop the currently running flow

Problem: something happened so that the flow runs forever. Is there any way to "abort" it?

Solution: use the Flow Builder's Stop/Cancel button.

abort flow

Important: canceling flow is a command which can take up to a few minutes to complete.

Once flow successfully canceled the status will change from Running to Canceled.

canceled flow