Data Replication

Set up log-based change data capture (CDC), full or incremental replication between different data sources with minimum effort. Use streaming platforms, such as Kafka to build real-time data pipelines.

Choosing the Right Method

Data Replication Methods

Given that the data replication method you choose will impact your data, we support various replication methods to give you as much flexibility as possible. The table below contains a high-level look at each of the Replication Methods available in Etlworks and compares their pros and cons.

Method Pros Cons
Method
Change Data Capture (CDC)

Uses database redo [transaction] log to track changes in the source


Read more

Pros
  • Fast
  • No polling from database tables, uses database redo log instead
  • Supports deletes
  • Supports [almost] real-time replication
Cons
  • Currently supports Postgres, MySQL, SQL Server, DB2, Oracle, and MongoDB
  • Some older versions of the databases above do not support CDC
  • Requires extra setup in the source database
Method
Change Data Tracking (CT)

Synchronous tracking mechanism, in which the changes on the information will be available directly once the DML change is committed.


Read more

Pros
  • Fast
  • No polling from database tables
  • Supports deletes
  • Supports [almost] real-time replication
Cons
  • Microsoft SQL Server only
  • Requires extra setup in the source database
Method
High Watermark

Uses a designated field, typically a TIMESTAMP, to track changes in the source


Read more

Pros
  • Fast
  • No extra moving parts
  • Works for all data sources, including all databases, files, and APIs
Cons
  • Does not support deletes
  • Requires a dedicated high watermark field in each table
Method
Database Triggers

Uses table(s) updated by the database triggers to track changes in the source


Read more

Pros
  • Works for any source database which has triggers
  • No extra requirements for the specific version of the database or extra field in each table
Cons
  • Requires adding triggers to all database tables
  • Triggers can negatively impact performance
Method
Real-time CDC with Kafka

Polls CDC events from the Kafka topic(s) to track changes in the source.


Read more

Pros
  • Fast
  • No polling from database tables
  • Supports deletes
  • Supports real-time replication
Cons
  • Complicated setup (requires Kafka, Zookeeper, Kafka Connect, and Debezium)
  • Currently supports Postgres, MySQL, SQL Server, DB2, Oracle, and MongoDB
  • Some older versions of the databases above do not support CDC
  • Requires extra setup in the source database
Method
Full Refresh

Always polls the entire dataset from the source.


Read more

Pros
  • The simplest to setup
  • Can be quite fast for the relatively small datasets
  • Works for all data sources
Cons
  • Not recommended for large datasets

Ready to Start Using Etlworks?

Try 14 Days Free
Start free trial
Get a Personalized Demo
Request Demo