Real-time CDC

Stream every database change.

Q: Do I need Kafka or Debezium installed?

No. Etlworks's CDC engine is built in and runs inside the platform. It uses the same Debezium-compatible binlog reading approach but doesn't require a separate Debezium installation, Kafka cluster, or schema registry. If you already use Kafka, you can publish CDC events to it as a destination — but it's not a prerequisite.

Q: What's the latency from source change to destination?

Sub-second in most production deployments. Actual latency depends on source database load, network, and destination throughput, but the engine itself processes change events as they're emitted by the database log. Lag metrics are visible per-pipeline in monitoring.

Q: How does Etlworks handle schema changes on the source?

Schema evolution is automatic for additive changes (new columns, larger types). Destructive changes (dropped columns, incompatible type changes) trigger a pipeline alert and require explicit confirmation before propagating. Schema history is retained for audit.

Q: Can I replicate from on-prem databases to cloud destinations?

Yes. Etlworks's hybrid agent runs inside your network, reads the database log locally, and streams change events out through an outbound HTTPS connection. No inbound ports, no VPN required. Used in production for petabyte-scale on-prem-to-Snowflake / BigQuery / Redshift pipelines.

Q: What happens if the destination is temporarily unavailable?

CDC pipelines persist offsets and resume from the last committed position. Source database log retention is the only constraint — as long as the log is still available, no events are lost. Default retention is 7 days; configurable per source.

Q: How is this different from Debezium?

Etlworks's CDC engine is Debezium-compatible at the protocol level — same binlog-reading approach, same change event format. The differences: it runs inside Etlworks (no separate Kafka Connect / Zookeeper / schema registry), includes a managed UI, supports transformations and routing in the same pipeline, and includes destinations beyond Kafka topics (warehouses, files, APIs). If you want raw Debezium for fan-out to many Kafka consumers, that's still a great fit. If you want managed CDC with destinations and transforms, Etlworks is faster to deploy.

Q: What about pricing for high-volume CDC?

Etlworks pricing is per-tier, not per-row or per-event. CDC throughput is included in every paid plan. High-volume customers (hundreds of millions of changes per day) typically run on Enterprise plans for HA and dedicated support, but there's no per-event surcharge.

Log-based change data capture for MySQL, Postgres, SQL Server, Oracle, MongoDB, DB2, and IBM i. Sub-second latency. Built-in CDC engine — no Kafka, no Debezium cluster, no Zookeeper.

Start free trial Talk to us

<1s: Replication latency
7: Database engines
10K/sec: Records / sec
No: Kafka required

The problem

CDC done right is harder than it looks.

Most “real-time” data tools poll every few minutes and call it streaming. True log-based CDC reads database transaction logs directly — but doing that without dropping events, fighting with Kafka, or melting your source database is the hard part.

Why most tools get this wrong

Polling isn't streaming. Kafka isn't free.

Tools that “support CDC” often mean periodic queries against an updated_at column — fine for low-volume, useless for transactional workloads. Tools that do real log-based CDC usually require Debezium + Kafka + schema registry + a team to keep them running. Etlworks built CDC into the core engine. Same Debezium-compatible binlog reading, no separate cluster to manage.

Capabilities

What's built in.

Log-based CDC

Reads database transaction logs directly. No polling, no triggers, no impact on source performance.

Sub-second latency

Changes propagate from source to destination in under a second. Lag metrics visible in monitoring.

No Kafka required

CDC engine runs inside Etlworks. Optional Kafka output if you want it, but not required for replication.

Schema evolution

Source schema changes (added columns, type changes) propagate automatically. No pipeline rebuild required.

Exactly-once delivery

Idempotent merge into destinations. Failures resume from last committed offset, no duplicates, no gaps.

Backfill + ongoing

Full snapshot first, then seamless transition to log-based streaming. One pipeline, no orchestration.

Specifications

Supported databases.

Six database engines with full log-based CDC. Each with documented configuration, supported versions, and proven deployments at scale.

Relational databases

MySQL / MariaDB

5.7+ / 8.x · binlog row-based · GTID supported

PostgreSQL

10+ · logical replication via pgoutput · publication slots managed

SQL Server

2016+ · CDC tables and Change Tracking · Always On supported

Oracle

11g+ / 12c+ / 19c · LogMiner · GoldenGate-compatible

NoSQL & other

MongoDB

3.6+ · oplog-based · sharded clusters supported

IBM DB2

10+ · log-based CDC · all platforms (LUW, z/OS)

AS/400 / IBM i

DB2 for IBM i 7.1+ · journal-based CDC

Comparing CDC tools? See Etlworks vs Debezium, Fivetran, and Qlik Replicate

Proof

Petabyte-scale CDC, in production.

“Etlworks is the greatest combination of performance, versatility, cost efficiency, reliability, and ease of use we've seen for anything ETL-related. Whether it's live CDC streaming, bulk loading, or complex ETL pipelines — everything has a solution.”

Heiko Parmas

Data Warehouse Architect, AS Tallink Grupp

FAQ

Common questions.

Do I need Kafka or Debezium installed?

No. Etlworks's CDC engine is built in and runs inside the platform. It uses the same Debezium-compatible binlog reading approach but doesn't require a separate Debezium installation, Kafka cluster, or schema registry. If you already use Kafka, you can publish CDC events to it as a destination — but it's not a prerequisite.

What's the latency from source change to destination?

Sub-second in most production deployments. Actual latency depends on source database load, network, and destination throughput, but the engine itself processes change events as they're emitted by the database log. Lag metrics are visible per-pipeline in monitoring.

How does Etlworks handle schema changes on the source?

Schema evolution is automatic for additive changes (new columns, larger types). Destructive changes (dropped columns, incompatible type changes) trigger a pipeline alert and require explicit confirmation before propagating. Schema history is retained for audit.

Can I replicate from on-prem databases to cloud destinations?

Yes. Etlworks's hybrid agent runs inside your network, reads the database log locally, and streams change events out through an outbound HTTPS connection. No inbound ports, no VPN required. Used in production for petabyte-scale on-prem-to-Snowflake / BigQuery / Redshift pipelines.

What happens if the destination is temporarily unavailable?

CDC pipelines persist offsets and resume from the last committed position. Source database log retention is the only constraint — as long as the log is still available, no events are lost. Default retention is 7 days; configurable per source.

How is this different from Debezium?

Etlworks's CDC engine is Debezium-compatible at the protocol level — same binlog-reading approach, same change event format. The differences: it runs inside Etlworks (no separate Kafka Connect / Zookeeper / schema registry), includes a managed UI, supports transformations and routing in the same pipeline, and includes destinations beyond Kafka topics (warehouses, files, APIs). If you want raw Debezium for fan-out to many Kafka consumers, that's still a great fit. If you want managed CDC with destinations and transforms, Etlworks is faster to deploy.

What about pricing for high-volume CDC?

Etlworks pricing is per-tier, not per-row or per-event. CDC throughput is included in every paid plan. High-volume customers (hundreds of millions of changes per day) typically run on Enterprise plans for HA and dedicated support, but there's no per-event surcharge.

Start your trial

14 days. No card. Real workloads.

Spin up a free trial, point it at your production database, and see what sub-second CDC actually feels like.

Start free trial Talk to us