Recipes
25+ patterns lifted from production flows. Examples in JavaScript except where Python adds something specific. Every recipe runs in the inline editor — no external setup.
Mapping fields
Get and set a field's value in Mapping
Setting a field's value is available under field-to-field Mapping. Three ways to read; two ways to write.
Read
// Most explicit — works in any field function
value = dataSet.getFieldValue(currentRow, "field name");
// Shorthand for the current field in context (only inside this field's function)
value = fieldValue;
// Token form — slightly slower (preprocessed)
value = {field name};
Write
dataSet.setFieldValue(currentRow, "field name", fieldValue);
Set a field's value using native SQL (inline SQL)
If the destination is a relational database, you can compute a column value with the database's own SQL dialect — pushed down by the engine. Faster than JS/Python because there's no context switch.
Setup
- Make sure Use Bind Variables is enabled (default).
- In Mapping, click the pen button to open the field function editor.
- Change the language of the field function editor to SQL.
- Reference any column as
{column_name}; flow / global vars as{VARNAME}(uppercase).
Example
LOWER({first_name}) || '-' || LOWER({last_name})
The engine generates the destination INSERT/UPDATE/MERGE with the SQL inlined; bind variables substitute the actual column values at execution time.
Anonymize / randomize a column
etlConfig.randomizer(locale) exposes a built-in anonymizer with locale-aware data providers (names, emails, addresses, …).
Pure random
value = etlConfig.randomizer("en").internet().emailAddress();
Stable (same source value → same anonymous value within the flow)
var randomizer = etlConfig.randomizer("en");
value = randomizer.reuse(fieldValue, randomizer.name().firstName());
Locale examples
etlConfig.randomizer("de").name().firstName();
etlConfig.randomizer("en").name().lastName();
etlConfig.randomizer("en").internet().emailAddress();
Flow control
Run JavaScript or Python as a flow
The dedicated Execute JavaScript or Python flow type runs only your code — no source, no destination. Compose it inside a nested flow alongside other flow types.
Inside the script, etlConfig and scenario are available by name. Useful for orchestration, custom HTTP calls, sending notifications, computing complex inputs for downstream flows.
Conditional step in a workflow
Any workflow step can run conditionally based on a script that returns true (run) or false (skip). Useful when the next step depends on:
- The result of a previous step
- The presence of specific data
- Workflow parameters or variables
- External system state
Configured at the workflow step level — see the platform docs on conditional execution.
Loop a flow
A workflow step can loop — running the same flow repeatedly with different parameters or input data. Common cases: paginated APIs, files matching a wildcard, records returned by a query, fixed N iterations. Configured at the workflow step level.
Dataset transforms
Five transformation phases
Scripting transformations run at specific points in the source-to-destination pipeline. Pick the one that matches when you need to act:
| Phase | When it runs | Typical use |
|---|---|---|
| Before Prepare Source Query | Before the source query is merged with flow variables | Modify the source SQL programmatically; set flow variable values |
| Before Extract | Before extraction starts | Set globals / flow vars, log, prep state |
| For Each Row | For each extracted row | Modify field values on the fly (alternative to per-field Mapping) |
| After Extract | After extraction is complete (entire dataset in memory) | Build a brand-new dataset from the extracted one. Requires streaming disabled. |
| After Load | Last step, after data has loaded into the destination | Cleanup, set vars, log, send notifications |
All five can be used together in the same transformation. Configure under Transformation → MAPPING → Additional Transformations.
Modify the source query before it runs
source.setSql('select * from table where id > {VAR_NAME}');
scenario.getVariable('VAR_NAME').setValue(value);
DataSet API — methods you'll actually use
| Method | Purpose |
|---|---|
getRecordCount() | Number of records in the dataset |
getFieldCount() | Number of fields (columns) |
getRecord(index) | Returns a DataSetRecord by index |
getData() | Array of records |
getFields() | Array of fields (column metadata) |
getFieldDef(name) · getFieldDef(index) | Field definition by name or position |
findField(name) | TypedKeyValue<FieldDef, DataSet> — finds a field and the nested dataset it belongs to (if any) |
setData(DataSetData) | Replace records |
setFields(DataSetFields) | Replace fields (columns) |
getFieldValue(row, column) | Value at row + column position |
getFieldValue(record, column) | Value from a specific record + column |
getFieldValue(record, name) | Value from a specific record + field name |
addField(FieldDef) · addField(name, value) | Add a column |
setFieldValue(record, column, value) · setFieldValue(row, column, value) | Set a value |
setValue(record, name, value) | Set by field name; adds the field if it doesn't exist |
getActualData() · setActualData(Object) | Internal-format access (used for HL7) |
Build a brand-new dataset from the source
Common pattern: traverse the extracted dataset, build a stage dataset with restructured fields (including flattening nested arrays), then replace the live one. Run as After Extract with streaming disabled.
var stageDataSet = new com.toolsverse.etl.common.DataSet();
var rows = dataSet.getRecordCount();
for (var row = 0; row < rows; row++) {
var record = dataSet.getRecord(row);
stageDataSet.addField("Name.FirstName", dataSet.getFieldValue(record, "firstName"));
stageDataSet.addField("Name.LastName", dataSet.getFieldValue(record, "lastName"));
stageDataSet.addField("Age", dataSet.getFieldValue(record, "age"));
stageDataSet.addField("City", dataSet.getFieldValue(record, "city"));
stageDataSet.addField("State", dataSet.getFieldValue(record, "state"));
stageDataSet.addField("ZipCode", dataSet.getFieldValue(record, "postalCode"));
stageDataSet.addField("Addr1", dataSet.getFieldValue(record, "streetAddress"));
// Flatten a nested array
var phones = dataSet.getFieldValue(record, "phoneNumbers");
if (phones != null && phones.getRecordCount() > 0) {
var n = phones.getRecordCount();
for (var phoneRow = 0; phoneRow < n; phoneRow++) {
stageDataSet.addField("Phone.Type",
dataSet.getFieldValue(record, "phoneNumbers.phoneNumbers_type", '.', phoneRow));
stageDataSet.addField("Phone.Num",
dataSet.getFieldValue(record, "phoneNumbers.phoneNumbers", '.', phoneRow));
}
}
}
dataSet.setFields(stageDataSet.getFields());
dataSet.setData(stageDataSet.getData());
Lookup
Extract & lookup — query an external source per row
Pull a single value or an entire dataset from any connection while processing source records. Use the source connection (SELECT …), the destination connection (destination.SELECT …), or a third connection by name (connection_name.SELECT …).
Single field value — Extractor.lookup(...)
value = com.toolsverse.etl.core.engine.Extractor.lookup(
etlConfig, scenario,
"connection_for_directors_db", // connection name
"directors", // dataset name (any string)
"first_name", // column to return
"SELECT first_name FROM directors WHERE id = {ID}", // sql, can be null
new com.toolsverse.util.TypedKeyValue("ID",
dataSet.getFieldValue(currentRow, 'director_id'))
);
Entire dataset — same call, different signature
var images = com.toolsverse.etl.core.engine.Extractor.lookup(
etlConfig, scenario,
"connection for lookup",
"image",
"SELECT filename AS url FROM property_pictures WHERE id = {ID}",
new com.toolsverse.util.TypedKeyValue("ID",
dataSet.getFieldValue(currentRow, 'picture_id'))
);
Cached vs non-cached
| Use | When |
|---|---|
Extractor.lookup(...) | Performance — cache results across rows |
Extractor.lookupNoCache(...) | Data changes between calls |
Callable SQL lookup
For lookups that need stored procedures or PL/SQL-style blocks, use Extractor.lookupWithCallable(...):
var sql = 'begin\n' +
' select country from country where country_id = {country_id}\n' +
'end';
value = com.toolsverse.etl.core.engine.Extractor.lookupWithCallable(
etlConfig, scenario,
'SQL Server',
'country',
'country',
sql,
new com.toolsverse.util.TypedKeyValue('country_id',
dataSet.getFieldValue(currentRow, 'country_id'))
);
In-memory lookups (already-loaded datasets)
If the lookup data is already loaded in memory (a JSON / XML / CSV source, an API response), there's no need to round-trip through a database.
SQL on an in-memory dataset
// Single field
value = com.toolsverse.etl.common.CommonEtlUtils.getFieldValue(
scenario.getSources().get("DIRECTORS").getDataSet(),
"select last_name where id = {ID}",
"last_name",
new com.toolsverse.util.TypedKeyValue("ID",
dataSet.getFieldValue(currentRow, 'director_id'))
);
// Entire dataset
var phones = com.toolsverse.etl.common.CommonEtlUtils.executeSql(
scenario.getSources().get("PHONES").getDataSet(),
"select * where area_code=412"
);
Filter expression on an in-memory dataset
Faster than SQL for flat datasets:
// Single field
value = com.toolsverse.etl.common.CommonEtlUtils.lookup(
scenario.getSources().get("DIRECTORS").getDataSet(),
"id=" + dataSet.getFieldValue(currentRow, 'director_id'),
"last_name"
);
// Entire dataset
var phones = com.toolsverse.etl.common.CommonEtlUtils.filter(
scenario.getSources().get("PHONES").getDataSet(),
"area_code=412"
);
Filter & validate
Simple filter expression
Available under MAPPING → Additional Transformations → Filter. Reject rows by expression. Field values are accessible by name ("field name").
| Mode | Operators |
|---|---|
| JavaScript syntax | !=, &&, ||, === |
| Use SQL like syntax enabled | <>, and, or, like |
"Importer Name" != null && "Importer Name".contains("test");
"Importer Name" <> null and "Importer Name".contains("test")
Returns true → keep the row, false → drop it.
Filter / modify rows in a source CSV file
Configurable in CSV Format. The runtime injects a filter object you control via getters/setters:
var javaImports = new JavaImporter(com.toolsverse.etl.connector.text);
with (javaImports) {
// Skip the second line (1-based actual index)
if (filter.getActualIndex() == 1) {
filter.setCode(TextFilter.TextFilterCode.REJECT);
filter.setIndex(1);
}
}
Codes: STOP (halt parsing), REJECT (skip line), CONTINUE (default).
Validate — TaskResult codes
Available under MAPPING → Additional Transformations → Validate. Different from Filter: you can halt the entire flow, reject the entire dataset, or just drop a single row.
| Code | Effect |
|---|---|
TaskResult.HALT | Stop the flow with an exception |
TaskResult.STOP | Skip the source-to-destination transformation entirely (no rows loaded) |
TaskResult.REJECT | Skip the current row only |
TaskResult.CONTINUE | Accept the row (default) |
Reject a row when required fields are missing or malformed
var postalCode = dataSet.getFieldValue(currentRow, 'postal_code');
var phone = dataSet.getFieldValue(currentRow, 'phone');
if (Utils.isNothing(postalCode) || Utils.isNothing(phone) || !Utils.isNumber(postalCode)) {
value = TaskResult.REJECT;
} else {
value = TaskResult.CONTINUE;
}
Swap REJECT for HALT to stop the entire flow, or STOP to skip the whole transformation.
Source messages
Preprocess the source message
Modify the raw source document before Etlworks parses it — useful for fixing encoding, supplying default content for empty payloads, or stringifying nested nodes.
Setup
- Create an XML, JSON, or CSV Format.
- Set Transformation type to preprocessor.
- Write JavaScript that modifies the
messagevariable.
// Strip the first 12 chars (e.g. a header byte-order mark or framing)
value = message.substring(12);
// Or rely on last-expression
message.substring(12);
Store the original source message
Sometimes you want both the parsed dataset and the raw message. Stash the raw message in a global variable from the preprocessor, then read it anywhere downstream:
com.toolsverse.config.SystemConfig.instance().getProperties()
.put("raw_source", message);
value = message;
var raw = com.toolsverse.config.SystemConfig.instance().getProperties()
.get("raw_source");
Serialize
DataSet → JSON / XML / CSV string
One call, any of the three formats. Useful when you want to keep a nested node as a single string field, or generate a payload to send via HTTP.
var str = com.toolsverse.etl.connector.ConnectorUtils.dataSet2Str(
dataSet,
'fully.qualified.connector.class.name',
parameters // string "k=v;k=v" or null for defaults
);
Connectors
| Format | Class |
|---|---|
| JSON | com.toolsverse.etl.connector.json.JsonConnector |
| XML | com.toolsverse.etl.connector.xml.XmlObjectConnector |
| CSV | com.toolsverse.etl.connector.text.TextConnector |
JSON parameters
| Parameter | Values | Default |
|---|---|---|
start | array · object · namedobject · variable | array |
rootname | any string | null |
nested | array · object · variable | array |
forsql | true / false | false |
XML parameters
| Parameter | Values | Default |
|---|---|---|
rootname | any string | automatic |
namespace | valid namespace | null |
xmlversion | 1.0 / 1.1 | 1.0 |
rowtag | valid XML tag name | automatic |
parseattrs · parserootattrs · parsecdata · forsql | true / false | see article |
CSV parameters
| Parameter | Values | Default |
|---|---|---|
delimiter | any character | , |
lineseparator | u Unix · w Windows · s automatic | s |
charseparator | any character | " |
alwaysquote · escapequote · forsql | true / false | false |
Examples
var jsonStr = com.toolsverse.etl.connector.ConnectorUtils.dataSet2Str(dataSet,
'com.toolsverse.etl.connector.json.JsonConnector',
'start=variable;nested=variable');
var xmlStr = com.toolsverse.etl.connector.ConnectorUtils.dataSet2Str(dataSet,
'com.toolsverse.etl.connector.xml.XmlObjectConnector',
'rootname=products;rowtag=product');
var csvStr = com.toolsverse.etl.connector.ConnectorUtils.dataSet2Str(dataSet,
'com.toolsverse.etl.connector.text.TextConnector',
'alwaysquote=true;escapequote=true');
Inside a SELECT
Etlworks can run SQL against any data object. You can serialize a nested dataset directly inside a SELECT:
SELECT com.toolsverse.etl.connector.ConnectorUtils.dataSet2Str(
dataSet, 'fully.qualified.connector.class.name', parameters
) AS field_name,
other_fields
FROM …
In SQL context, replace = with = and ; with ; inside the parameters string.
JSON / XML / CSV string → DataSet
The inverse of above — ingest a string in any of the three formats and get a DataSet:
var dataSet = com.toolsverse.etl.connector.ConnectorUtils.str2DataSet(
null, // pass null for the EtlConfig param
'fully.qualified.connector.class.name',
parameters, // same as for dataSet2Str
encodedStr // the input string
);
Connectors and parameters are identical to dataSet2Str.
SQL
Conditional SQL action per row
For database destinations, decide INSERT / UPDATE / DELETE / MERGE / nothing per row. Configure under MAPPING → Parameters → Action Conditions.
// If column "Types" equals 'U', update; otherwise insert
{Types}.equals('U') ? 'update' : 'insert';
Set the action to 'nothing' to skip the record entirely. Available actions: 'insert', 'update', 'delete', 'merge', 'nothing'.
Execute SQL from JavaScript or Python
Run arbitrary SQL against any connection wired into the flow — useful for status updates inside complex nested flows, or for writing rejected records to an errors table.
var ex = etlConfig.getLastException();
etlConfig.setLastException(null);
var props = com.toolsverse.config.SystemConfig.instance().getContextProperties();
var stagingConnection = etlConfig.getConnection("Staging Postgres");
var statusSql = "insert into feeds_status (filename, url, error, finished) " +
"values (?,?,?,?)";
var message = null;
if (ex != null) {
message = ex.getMessage();
if (Utils.isNothing(message)) {
message = com.toolsverse.util.Utils.getStackTraceAsString(ex);
}
}
var when = new java.util.Date();
try {
com.toolsverse.etl.sql.util.SqlUtils.executeSql(
stagingConnection, statusSql,
props.get('filename'),
props.get('url'),
message,
when
);
} catch (ex) {
etlConfig.log(ex.toString());
}
Required imports: com.toolsverse.config, com.toolsverse.etl.sql.util. The connection must be part of the flow (either used in a transformation, or added as a named connection on the nested flow).
Connections in code
Load text data from any connection
Named connection
var images = com.toolsverse.etl.core.task.common.FileManagerTask.read(
etlConfig, "connection_name", null
);
On-the-fly connection
var alias = new com.toolsverse.etl.common.Alias();
alias.setUrl("https://example.com/api/images");
alias.setTransport('http');
var images = com.toolsverse.etl.core.task.common.FileManagerTask.read(alias, null);
The last parameter is a filename — pass null for HTTP, use the actual filename for file connections.
Load binary data — readAsStream
Returns a java.io.InputStream. Common pattern: load an image from one connection, transform it, save to another.
var imageStream = null;
var imageStreamToSave = null;
var os = null;
try {
imageStream = com.toolsverse.etl.core.task.common
.FileManagerTask.readAsStream(etlConfig, 'image', null);
var image = com.toolsverse.imaging.ImageProcessor.instance().loadImage(imageStream);
var newImage = com.toolsverse.imaging.ImageProcessor.instance()
.scale(image, Java.to([0.5, 0.5], "java.lang.Float[]"));
os = new java.io.ByteArrayOutputStream();
javax.imageio.ImageIO.write(newImage, "png", os);
imageStreamToSave = new java.io.ByteArrayInputStream(os.toByteArray());
var alias = new com.toolsverse.etl.common.Alias();
alias.setUrl("logo.png");
alias.setTransport('file');
com.toolsverse.etl.core.task.common
.FileManagerTask.write(alias, null, imageStreamToSave);
} finally {
if (imageStream != null) imageStream.close();
if (os != null) os.close();
if (imageStreamToSave != null) imageStreamToSave.close();
}
Use a named connection in JavaScript
For a connection to be referenceable from JavaScript, it must be part of the flow:
- If the connection is part of a source-to-destination transformation, it's automatically available in the flow itself, in any nested flow that includes it, and in all inner flows.
- Otherwise, add it manually as a named connection on the nested flow.
HTML scraping with jsoup
Etlworks bundles jsoup — a full HTML5 parser. Load a page, parse, traverse the DOM, extract.
JavaScript
var html = com.toolsverse.etl.core.task.common.FileManagerTask.read(
etlConfig, 'html', null
);
var doc = org.jsoup.Jsoup.parse(html);
var files = new java.util.ArrayList();
etlConfig.setValue('files', files);
var root = doc.select("h3 > a[name=annualproj]").get(0).parent().nextElementSibling();
var spans = root.select("ul.numbered li > span.formats");
for each (var span in spans) {
var link = span.nextElementSibling();
var href = link.attr("href").toString();
var title = link.attr("title").toString();
title = title.substring(title.indexOf('.') + 1).trim();
files.add(new com.toolsverse.util.TypedKeyValue(href, title));
etlConfig.log("href: " + href + ", title: " + title);
}
Python (same flow)
from org.jsoup import Jsoup
from com.toolsverse.etl.core.task.common import FileManagerTask
from com.toolsverse.util import TypedKeyValue, Utils
from java.util import ArrayList
html = FileManagerTask.read(etlConfig, 'html', None)
doc = Jsoup.parse(html)
files = ArrayList()
etlConfig.setValue('files', files)
root = doc.select("h3 > a[name=annualproj]").get(0).parent().nextElementSibling()
spans = root.select("ul.numbered li > span.formats")
for span in spans:
link = span.nextElementSibling()
href = link.attr("href")
title = link.attr("title")
title = title[title.index(".") + 1:].strip()
files.add(TypedKeyValue(href, title))
etlConfig.log("href: " + href + ", title: " + title)
Utilities
Sequence generators
UniqueNumber — in-process, timestamp-based
Cache the same instance in etlConfig's key/value storage so the sequence stays consistent across invocations within a flow.
var uniqueNumber = etlConfig.getValue("uniqueNumber");
if (uniqueNumber == null) {
uniqueNumber = new com.toolsverse.util.UniqueNumber();
etlConfig.setValue("uniqueNumber", uniqueNumber);
}
value = uniqueNumber.nextNumber();
Algorithm: (System.currentTimeMillis() << 20) + n, where n is incremented each call.
RedisSequenceGenerator — cluster-wide
Backed by Redis — safe across parallel flows. Up to 10K numbers/sec.
// key = anything (e.g. table name), startValue = starting sequence number
var seq = com.toolsverse.io.RedisSequenceGenerator.instance()
.nextNumber('my_table_name', 1);
Share parameters across flows
Variables on a flow are local to it (and its nested children). To share state across independent flows — e.g. parallel runners, sequential dependent flows, real-time stats collected via a webhook + listener — use the Redis-backed parameter store.
Set / get many at once
var params = new java.util.HashMap();
params.put("key1", "value1");
params.put("key2", "value2");
// anyid: any string (auditId / flowId / both); expireInSec: 1..86400 (24h max)
com.toolsverse.io.RedisSequenceGenerator.instance()
.setParameters("anyid", params, expireInSec);
// key can be null — uses wildcard *
var got = com.toolsverse.io.RedisSequenceGenerator.instance()
.getParameters("anyid", null);
var v1 = got.get("key1");
Set / get a single value
com.toolsverse.io.RedisSequenceGenerator.instance()
.setParameter("anyid", "key1", "value1", expireInSec);
var v = com.toolsverse.io.RedisSequenceGenerator.instance()
.getParameter("anyid", "key1");
Recommended keys
// Audit ID for the currently running flow
var auditId = com.toolsverse.config.SystemConfig.instance()
.getEtlThreadContext().getRequestId();
// Flow ID
var flowId = com.toolsverse.config.SystemConfig.instance()
.getEtlThreadContext().getFlowId();
When collecting parameters via a webhook payload, the auditId is in JSON.parse(message).entity.auditId.
An on-prem agent runs on a different network and can't reach Redis directly. Workaround: have a flow on the agent POST to /etl/rest/v1/parameters on the host instance via an Etlworks API connection. The platform writes the parameters to Redis on the agent's behalf.
Working with dates
Dates are java.util.Date under the hood. The most useful helpers:
| Need | Code |
|---|---|
Current timestamp as Date | new java.util.Date() |
| Current timestamp in millis | System.currentTimeMillis() |
| Format a Date as a string | com.toolsverse.util.Utils.date2Str(date, 'yyyy-MM-dd HH:mm:ss') |
| Parse a string into a Date | com.toolsverse.util.Utils.str2Date(string, defaultDate, format) |
| Reformat between two string formats | Utils.date2Str(Utils.str2Date(s, null, srcFmt), dstFmt) |
Add / subtract
All on com.toolsverse.util.DateUtil. Negative second arg subtracts.
var d1 = com.toolsverse.util.DateUtil.addDays(new java.util.Date(), 1);
var d2 = com.toolsverse.util.DateUtil.addHours(new java.util.Date(), -3);
var d3 = com.toolsverse.util.DateUtil.addMinutes(new java.util.Date(), 30);
var d4 = com.toolsverse.util.DateUtil.addSeconds(new java.util.Date(), 100);
var d5 = com.toolsverse.util.DateUtil.addMillis(new java.util.Date(), 10000);
var d6 = com.toolsverse.util.DateUtil.addMonths(new java.util.Date(), 1);
var d7 = com.toolsverse.util.DateUtil.addYears(new java.util.Date(), 1);
Date in a specific timezone
var calendar = java.util.Calendar.getInstance(
java.util.TimeZone.getTimeZone("EST")
);
var formatted = calendar.get(java.util.Calendar.YEAR) + '/' +
calendar.get(java.util.Calendar.MONTH) + '/' +
calendar.get(java.util.Calendar.DATE);
Date format letters (y, M, d, H, m, s, S, z, Z, X, …) follow the standard Java SimpleDateFormat conventions. Timezone IDs are the tz database values. Both full tables are mirrored at Work with dates — ~150 KB of reference content we don't duplicate here.
Home URL of the app
Configured under Settings → General → Home URL. Useful in emails, environment-detection logic, building absolute links.
var homeUrl = etlConfig.getHomeUrl();