AI data integration

Built for the agentic era.

A real agent inside the product. AI in every workflow. And an API so your other agents can use Etlworks as a tool. Not a chatbot bolted on — a data integration platform built for the world where agents and humans work together.

3
AI surfaces
12+
Tools the agent uses
REST
Agent-to-agent API
No
Training on your data

The problem

Most “AI ETL” is just a chatbot.

Vendors race to add “AI” pages. Most ship a sidebar that searches docs and calls it an agent. Real agents use tools — they read your metadata, sample your data, write SQL, run flows, and report back. Most “AI ETL” tools don't.

What's usually under the hood

A wrapper around docs is not an agent.

If the “AI” can answer questions but can't do anything — can't sample your data, can't write a transformation, can't fix a broken pipeline — it's a search box with extra steps. Etlworks built the agent into the engine. It uses real tools, makes real changes, and shows you exactly what it did. Same engine, two paths: your team builds, or the agent builds. Or both.

Capabilities

Three AI surfaces, one platform.

Etlworks's AI story isn't one feature — it's a layered set of capabilities, all sharing the same trust model and access controls.

01 The agent inside the product
Simba — accessible from every screen

Vibe-build flows

Describe a flow in chat. The agent creates connections, mappings, and schedules — you approve before anything runs.

Writes JS & SQL

Generates working transformation code in your languages — not pseudocode, not stubs. Tests in a sandbox before commit.

Reads metadata + data

Inspects mappings, columns, lineage. Samples and validates live data to debug — never trains on your data.

Chat everywhere

Same agent, same context, every screen. Designing a flow? Debugging a run? Reviewing logs? It's there.

Full CLI access

Run, deploy, monitor, manage — same commands as your DevOps team. Scriptable, auditable, version-control friendly.

Builds analytics

Asks pipeline questions in plain English — slowest flows today, error trends this week, resource utilization right now.

02 AI in the platform itself
Passive AI — works without prompting

AI-augmented mapping

Suggests source-to-destination column mappings based on schema, names, and sample values. Confidence-scored, you approve.

Schema discovery

Detects table relationships, primary keys, lineage hints from sources you connect. Faster onboarding for unfamiliar systems.

Insights dashboards

AI-generated summaries of pipeline health — error patterns, performance regressions, cost anomalies — surfaced automatically.

03 Etlworks as a subagent
REST API for your AI stack

Direct tool access

Call individual agent tools — search KB, run CLI, import templates — without going through the LLM. Predictable, cheap.

Full agent chat

Send messages, get intelligent responses. The agent picks and chains tools to answer complex questions — same as in the UI.

Subagent integration

Use Etlworks as a subagent in LangChain, CrewAI, AutoGen, or any orchestration framework. A specialist your agents can delegate to.

Trust & boundaries

You stay in control.

Real agents create real questions about safety, training, and access. Etlworks's answers are clear and built into the platform — not afterthoughts.

Never trains on your data

The agent reads your metadata and samples your data to do its work — that data never leaves your tenant for training. Period.

Every action is opt-in

Each capability — read data, write code, run flows, modify schedules — is enabled per-user, per-environment. Default is read-only.

Approval before write

Destructive actions — creating connections, modifying flows, running pipelines — require human approval inline before execution.

Full audit log

Every agent action — every prompt, every tool call, every change — is logged with timestamp, user, and outcome. Exportable, queryable.

Disable any tool

Don't want the agent writing SQL in production? Turn it off. Don't want CLI access? Turn it off. Per-tool, per-environment.

Same RBAC as humans

The agent operates with the calling user's permissions. It can't escalate, can't see what they can't see, can't act beyond their scope.

Specifications

Tools the agent can use.

A complete inventory of what Simba has access to. Each tool is opt-in, audited, and scoped to the calling user's permissions.

Design & build
create_flow
Generates a complete flow from natural language. Connections, mappings, schedules. User approves before save.
edit_flow
Modifies existing flow definitions. Diffs are shown inline before commit.
write_transform
Writes JS, SQL, Python, or Groovy transformations. Tests in sandbox before staging.
Read & inspect
read_metadata
Inspects connections, mappings, schedules, lineage. Read-only by definition.
sample_data
Pulls representative rows for debugging. Sampled data stays in your tenant — never used for training.
search_knowledge_base
Retrieves Etlworks documentation, best practices, examples. Context-aware to the user's screen.
Run & observe
run_flow
Executes a flow on demand. Requires explicit user confirmation in chat.
cli_command
Runs CLI operations — start, stop, deploy, monitor. Same permission scope as the calling user.
read_logs
Inspects flow runs, errors, durations, audit history. Used for debugging and analytics.
External access (API)
REST endpoints
/chat for full agent · /tools/{name}/execute for direct tool calls · /sessions for multi-turn
Streaming
Server-Sent Events (SSE) — token-by-token responses with tool-call visibility
Auth
API key (Bearer token). No OAuth flow, no token refresh. Per-user scoped.

Comparing AI in ETL platforms? See Etlworks vs Matillion Maia, Informatica CLAIRE, and Talend

FAQ

The skeptical questions.

Does Etlworks train on my data?
No. The agent reads your metadata and samples your data to do its work — debug a flow, write a transformation, suggest a mapping — but that data never leaves your tenant for training purposes. We use frontier LLMs from major providers under enterprise agreements that prohibit training on customer data. Your data is yours.
What if the agent generates wrong SQL or breaks a pipeline?
Two safeguards: (1) destructive actions require human approval inline before execution — the agent shows you the diff and waits. (2) Generated transformations are tested in a sandbox against sample data before being saved. If the agent does ship something broken, full audit logs let you trace exactly what happened and revert. Failure mode: a SQL error in the sandbox, not a corrupted production warehouse.
How is this different from Matillion Maia or Informatica CLAIRE?
Maia and CLAIRE are real efforts at agentic ETL — Etlworks isn't dismissive of them. Differences: (1) Etlworks's agent is exposed via API as a subagent for your AI stack — Maia and CLAIRE are platform-internal. (2) Etlworks's agent has full CLI access — most competitors limit the agent to flow-design tasks. (3) Etlworks bundles a monthly agentic-credit allowance into the platform tier instead of metered per-call billing; customers who bring their own OpenAI API key pay nothing to Etlworks for inference. (4) Etlworks ships AI features outside the agent (mapping, discovery, insights). For deeper comparison, see the comparison hub or talk to our team.
Can I bring my own OpenAI API key?
Yes — drop in your own OpenAI API key and the agent runs inference through your OpenAI account. Etlworks doesn't bill you for LLM calls in that mode. OpenAI is the only swappable inference backend today; full BYO-LLM (Azure OpenAI, on-prem Llama / Ollama, etc.) isn't supported. You can still call Etlworks's REST API and CLI from any orchestration framework using any model on your side — but the agent's internal reasoning runs on OpenAI.
What's the cost of using the agent?
Q&A and AI features (in-product chat, mapping suggestions, discovery, insights) are absorbed in your platform tier — no per-token bill for asking questions or using the in-product AI features. Agentic work — the agent actually executing tasks like building flows or running tools — draws on a monthly credit allowance included with your tier. Heavy agentic users can drop in their own OpenAI API key to bypass Etlworks LLM billing entirely. Either way, no per-call surprises.
Can the agent see secrets or credentials?
No. Connection credentials, API keys, and secrets are stored encrypted and only accessible to the runtime — never sent to the LLM. The agent can reference connections by name and configure them via the UI patterns it already knows, but it never receives raw credential values.
How do I integrate Etlworks as a subagent in LangChain or CrewAI?
Etlworks ships Python, bash, and PowerShell clients plus a REST API. All four support both streaming and batch modes. Wrap whichever fits your stack as a tool in LangChain, CrewAI, AutoGen, or any orchestration framework — the agent then runs as a specialist your orchestration delegates data-integration tasks to.

Start your trial

14 days. No card. Real workloads.

Spin up a free trial, talk to the agent, and see what real-tool ETL feels like. Skeptics welcome — the agent has answers.