Bulk load into Delta
Stages files in S3, ADLS, or GCS, then runs COPY INTO against Delta tables at warehouse speed. No row-by-row inserts.
Databricks integration
ETL, ELT with pushdown, real-time CDC, and Reverse ETL — all into and out of Databricks. Delta Lake and Unity Catalog native. Predictable monthly pricing instead of consumption-based row counting.
The problem
Most ways into the lakehouse cost you twice. Consumption-priced ETL tools bill per row on top of your DBU spend, or you hand-roll ingestion notebooks that someone has to own, schedule, and debug. Either way, getting data into Delta becomes its own project.
Where lakehouse budgets go to die
CDC pipelines, high-velocity SaaS syncs, large historical backfills — exactly the data you want in Delta — are the workloads consumption-priced tools cost the most for. Etlworks bills per platform tier, not per record. Load 200 billion rows or 200 million; same monthly cost. And the loading patterns are tuned to keep your Databricks compute up for seconds, not hours.
Capabilities
Stages files in S3, ADLS, or GCS, then runs COPY INTO against Delta tables at warehouse speed. No row-by-row inserts.
Log-based CDC from MySQL, Postgres, SQL Server, Oracle, Mongo, DB2 — sub-second latency, MERGE INTO-based deduping into Delta.
Declare Databricks as the target and Etlworks generates Databricks SQL for in-warehouse transformations — or transform in flight with SQL, JavaScript, or Python.
Push modeled lakehouse data to Salesforce, HubSpot, Marketo, NetSuite, and 200+ SaaS targets. Same platform, same subscription.
Targets Delta tables across Unity Catalog's catalog.schema.table namespace. New columns and type changes propagate automatically — no DDL drift.
Stage-and-load minimizes warehouse-up time. Batched MERGEs, serverless SQL where available, file-based loading — the patterns Databricks recommends.
Patterns
Every Databricks data pipeline pattern, configured the same way. No separate tool for CDC, no separate tool for Reverse ETL, no ingestion notebooks to maintain by hand.
Stage files in cloud storage, then COPY INTO Delta. The pattern Databricks recommends, automated end to end.
Log-based CDC streams change events into Delta via MERGE INTO. Sub-second latency, no Kafka to run.
Push modeled data from Databricks to Salesforce, HubSpot, Marketo, NetSuite — 200+ SaaS targets.
Pricing transparency
Same workload — Salesforce account changes, Postgres orders, hourly SaaS syncs into Delta — priced under three common ETL pricing models. Numbers are approximate, based on public pricing as of 2026, and exclude Databricks compute itself.
Consumption (per-row)
~$8,000/mo
Scales linearly with row volume. Hidden surge pricing during busy months.
Credit-based
~$3,500/mo
Better, but credits expire, and peak-load tier upgrades add cost.
Etlworks (fixed tier)
$1,000/mo
Standard tier, all features, all rows. Predictable for budgets, painless for data teams.
Specifications
Every part of a Databricks pipeline you'd actually run — loading, CDC, catalog, and security — supported and documented.
Comparing data integration platforms? See the ETL platform comparison
FAQ
COPY INTO for bulk loads and MERGE INTO for change data. It connects to a Databricks SQL warehouse or cluster over the Databricks JDBC/SQL connector. No row-by-row inserts.MERGE INTO. INSERT, UPDATE, and DELETE are preserved, with sub-second source latency and idempotent application.catalog.schema.table). Schema evolution — new columns and type changes — propagates automatically so upstream changes don't break pipelines.COPY INTO runs a small SQL warehouse for seconds, MERGE operations during CDC are batched, and serverless SQL is used where available. In practice, ETL-driven compute is a small fraction of total DBU spend; analytics queries dominate. And because Etlworks bills per tier — not per row — your integration cost stays flat as volumes grow.Start your trial
Spin up a free trial, point it at your Databricks workspace, and load production data into Delta. See what predictable ingestion pricing actually feels like.