Data Replication

Set up log-based change data capture (CDC), full or incremental replication between different data sources with minimum effort. Use streaming platforms, such as Kafka to build real-time data pipelines.

Choosing the Right Method

Data Replication Methods

Given that the data replication method you choose will impact your data, we support various replication methods to give you as much flexibility as possible. The table below contains a high-level look at each of the Replication Methods available in Etlworks and compares their pros and cons.

Method	Pros	Cons
Method Change Data Capture (CDC) Uses database redo [transaction] log to track changes in the source Read more	Pros Fast No polling from database tables, uses database redo log instead Supports deletes Supports [almost] real-time replication	Cons Currently supports Postgres, MySQL, SQL Server, DB2, Oracle, and MongoDB Some older versions of the databases above do not support CDC Requires extra setup in the source database
Method Change Data Tracking (CT) Synchronous tracking mechanism, in which the changes on the information will be available directly once the DML change is committed. Read more	Pros Fast No polling from database tables Supports deletes Supports [almost] real-time replication	Cons Microsoft SQL Server only Requires extra setup in the source database
Method High Watermark Uses a designated field, typically a TIMESTAMP, to track changes in the source Read more	Pros Fast No extra moving parts Works for all data sources, including all databases, files, and APIs	Cons Does not support deletes Requires a dedicated high watermark field in each table
Method Database Triggers Uses table(s) updated by the database triggers to track changes in the source Read more	Pros Works for any source database which has triggers No extra requirements for the specific version of the database or extra field in each table	Cons Requires adding triggers to all database tables Triggers can negatively impact performance
Method Real-time CDC with Kafka Polls CDC events from the Kafka topic(s) to track changes in the source. Read more	Pros Fast No polling from database tables Supports deletes Supports real-time replication	Cons Complicated setup (requires Kafka, Zookeeper, Kafka Connect, and Debezium) Currently supports Postgres, MySQL, SQL Server, DB2, Oracle, and MongoDB Some older versions of the databases above do not support CDC Requires extra setup in the source database
Method Full Refresh Always polls the entire dataset from the source. Read more	Pros The simplest to setup Can be quite fast for the relatively small datasets Works for all data sources	Cons Not recommended for large datasets

Data Replication

Data Replication Methods

Change Data Capture (CDC)

Change Data Tracking (CT)

High Watermark

Database Triggers

Real-time CDC with Kafka

Full Refresh

Ready to Start Using Etlworks?

Try 14 Days Free

Get a Personalized Demo

SALES

Support