Databricks integration

Load the lakehouse. Both ways.

Q: How does Etlworks load data into Databricks?

Etlworks stages files in cloud storage (S3, Azure Blob / ADLS, or GCS) and loads them into Delta tables — using COPY INTO for bulk loads and MERGE INTO for change data. It connects to a Databricks SQL warehouse or cluster over the Databricks JDBC/SQL connector. No row-by-row inserts.

Q: Does Etlworks support real-time CDC into Databricks?

Yes. Log-based CDC from MySQL, Postgres, SQL Server, Oracle, MongoDB, and DB2 streams change events into Delta tables via MERGE INTO. INSERT, UPDATE, and DELETE are preserved, with sub-second source latency and idempotent application.

Q: Does Etlworks work with Unity Catalog and Delta Lake?

Yes. Targets are Delta Lake tables, and Etlworks honors Unity Catalog's three-level namespace (catalog.schema.table). Schema evolution — new columns and type changes — propagates automatically so upstream changes don't break pipelines.

Q: Can Etlworks push down transformations to Databricks (ELT)?

Yes. ELT pushdown is supported for Databricks alongside Snowflake, BigQuery, Redshift, and Synapse. The flow declares the target, and Etlworks generates Databricks SQL for in-warehouse transformations. You can also transform in flight with SQL, JavaScript, or Python, and Etlworks is dbt-friendly, dbt-optional.

Q: How does Etlworks authenticate to Databricks?

Via a Databricks personal access token or OAuth (including machine-to-machine service principals). The connection targets a SQL warehouse or cluster endpoint. Network controls such as IP access lists and PrivateLink are honored where configured on the workspace.

Q: Will Etlworks drive up my Databricks compute costs?

Etlworks's loading patterns minimize warehouse-up time: stage-and-load with COPY INTO runs a small SQL warehouse for seconds, MERGE operations during CDC are batched, and serverless SQL is used where available. In practice, ETL-driven compute is a small fraction of total DBU spend; analytics queries dominate. And because Etlworks bills per tier — not per row — your integration cost stays flat as volumes grow.

ETL, ELT with pushdown, real-time CDC, and Reverse ETL — all into and out of Databricks. Delta Lake and Unity Catalog native. Predictable monthly pricing instead of consumption-based row counting.

Start free trial Talk to us

Delta: Native load + MERGE
2-way: In + Reverse ETL
<1s: CDC latency
No: Per-row billing

The problem

Databricks compute is metered. Your ingestion bill shouldn't pile on.

Most ways into the lakehouse cost you twice. Consumption-priced ETL tools bill per row on top of your DBU spend, or you hand-roll ingestion notebooks that someone has to own, schedule, and debug. Either way, getting data into Delta becomes its own project.

Where lakehouse budgets go to die

Per-row pricing punishes the workloads that justify the lakehouse.

CDC pipelines, high-velocity SaaS syncs, large historical backfills — exactly the data you want in Delta — are the workloads consumption-priced tools cost the most for. Etlworks bills per platform tier, not per record. Load 200 billion rows or 200 million; same monthly cost. And the loading patterns are tuned to keep your Databricks compute up for seconds, not hours.

Capabilities

Lakehouse-native, end to end.

Bulk load into Delta

Stages files in S3, ADLS, or GCS, then runs COPY INTO against Delta tables at warehouse speed. No row-by-row inserts.

Real-time CDC into Databricks

Log-based CDC from MySQL, Postgres, SQL Server, Oracle, Mongo, DB2 — sub-second latency, MERGE INTO-based deduping into Delta.

ELT with pushdown

Declare Databricks as the target and Etlworks generates Databricks SQL for in-warehouse transformations — or transform in flight with SQL, JavaScript, or Python.

Reverse ETL out of Databricks

Push modeled lakehouse data to Salesforce, HubSpot, Marketo, NetSuite, and 200+ SaaS targets. Same platform, same subscription.

Unity Catalog & schema evolution

Targets Delta tables across Unity Catalog's catalog.schema.table namespace. New columns and type changes propagate automatically — no DDL drift.

Compute cost optimization

Stage-and-load minimizes warehouse-up time. Batched MERGEs, serverless SQL where available, file-based loading — the patterns Databricks recommends.

Patterns

Three flows, one platform.

Every Databricks data pipeline pattern, configured the same way. No separate tool for CDC, no separate tool for Reverse ETL, no ingestion notebooks to maintain by hand.

Data in

Source ADLS / S3 Delta

Bulk ETL / ELT

Stage files in cloud storage, then COPY INTO Delta. The pattern Databricks recommends, automated end to end.

Real-time

DB Log CDC MERGE INTO

CDC into Delta

Log-based CDC streams change events into Delta via MERGE INTO. Sub-second latency, no Kafka to run.

Reverse

Delta Transform SaaS

Reverse ETL out

Push modeled data from Databricks to Salesforce, HubSpot, Marketo, NetSuite — 200+ SaaS targets.

Pricing transparency

A typical 50M-row pipeline, three ways.

Same workload — Salesforce account changes, Postgres orders, hourly SaaS syncs into Delta — priced under three common ETL pricing models. Numbers are approximate, based on public pricing as of 2026, and exclude Databricks compute itself.

Consumption (per-row)

~$8,000/mo

Scales linearly with row volume. Hidden surge pricing during busy months.

Credit-based

~$3,500/mo

Better, but credits expire, and peak-load tier upgrades add cost.

Etlworks (fixed tier)

$1,000/mo

Standard tier, all features, all rows. Predictable for budgets, painless for data teams.

Specifications

Databricks integration depth.

Every part of a Databricks pipeline you'd actually run — loading, CDC, catalog, and security — supported and documented.

Staging

S3, Azure Blob / ADLS, Google Cloud Storage · auto-managed lifecycle

Load methods

COPY INTO for bulk · MERGE INTO for change data · Delta Lake tables

File formats

CSV, JSON, Parquet, Avro · staged then loaded

CDC & transforms

CDC into Databricks

MERGE INTO deduping · INSERT/UPDATE/DELETE preserved · idempotent

ELT pushdown

Databricks SQL generated for in-warehouse transformations · dbt-friendly

In-flight transforms

SQL, JavaScript, Python · applied during load

Catalog, security & auth

Unity Catalog

catalog.schema.table namespace · automatic schema evolution

Connection

SQL warehouse or cluster · Databricks JDBC / SQL connector

Authentication

Personal access token or OAuth (incl. M2M service principals) · IP access lists / PrivateLink honored

Comparing data integration platforms? See the ETL platform comparison

FAQ

Common questions.

How does Etlworks load data into Databricks?

Etlworks stages files in cloud storage (S3, Azure Blob / ADLS, or GCS) and loads them into Delta tables — using COPY INTO for bulk loads and MERGE INTO for change data. It connects to a Databricks SQL warehouse or cluster over the Databricks JDBC/SQL connector. No row-by-row inserts.

Does Etlworks support real-time CDC into Databricks?

Yes. Log-based CDC from MySQL, Postgres, SQL Server, Oracle, MongoDB, and DB2 streams change events into Delta tables via MERGE INTO. INSERT, UPDATE, and DELETE are preserved, with sub-second source latency and idempotent application.

Does Etlworks work with Unity Catalog and Delta Lake?

Yes. Targets are Delta Lake tables, and Etlworks honors Unity Catalog's three-level namespace (catalog.schema.table). Schema evolution — new columns and type changes — propagates automatically so upstream changes don't break pipelines.

Can Etlworks push down transformations to Databricks (ELT)?

Yes. ELT pushdown is supported for Databricks alongside Snowflake, BigQuery, Redshift, and Synapse. The flow declares the target, and Etlworks generates Databricks SQL for in-warehouse transformations. You can also transform in flight with SQL, JavaScript, or Python, and Etlworks is dbt-friendly, dbt-optional.

How does Etlworks authenticate to Databricks?

Via a Databricks personal access token or OAuth (including machine-to-machine service principals). The connection targets a SQL warehouse or cluster endpoint. Network controls such as IP access lists and PrivateLink are honored where configured on the workspace.

Will Etlworks drive up my Databricks compute costs?

Etlworks's loading patterns minimize warehouse-up time: stage-and-load with COPY INTO runs a small SQL warehouse for seconds, MERGE operations during CDC are batched, and serverless SQL is used where available. In practice, ETL-driven compute is a small fraction of total DBU spend; analytics queries dominate. And because Etlworks bills per tier — not per row — your integration cost stays flat as volumes grow.

Start your trial

14 days. No card. Real workloads.

Spin up a free trial, point it at your Databricks workspace, and load production data into Delta. See what predictable ingestion pricing actually feels like.

Start free trial Talk to us