Method
Change Data Capture (CDC)
Uses database redo [transaction] log to track changes in the source
Read
more
|
Pros
- Fast
- No polling from database tables, uses database redo log instead
- Supports deletes
- Supports [almost] real-time replication
|
Cons
- Currently supports Postgres, MySQL, SQL Server, DB2, Oracle, and
MongoDB
- Some older versions of the databases above do not support CDC
- Requires extra setup in the source database
|
Method
Change Data Tracking (CT)
Synchronous tracking mechanism, in which the changes on the information will be available directly once the DML change is committed.
Read
more
|
Pros
- Fast
- No polling from database tables
- Supports deletes
- Supports [almost] real-time replication
|
Cons
- Microsoft SQL Server only
- Requires extra setup in the source database
|
Method
High Watermark
Uses a designated field, typically a TIMESTAMP, to track changes in the source
Read
more
|
Pros
- Fast
- No extra moving parts
-
Works for all data sources, including
all databases, files, and APIs
|
Cons
- Does not support deletes
-
Requires a dedicated high watermark
field in each table
|
Method
Database Triggers
Uses table(s) updated by the database triggers to track changes in the source
Read
more
|
Pros
- Works for any source database which has triggers
-
No extra requirements for the specific version
of the database or extra field in each table
|
Cons
- Requires adding triggers to all database tables
- Triggers can negatively impact performance
|
Method
Real-time CDC with Kafka
Polls CDC events from the Kafka
topic(s) to track changes in the source.
Read
more
|
Pros
- Fast
- No polling from database tables
- Supports deletes
- Supports real-time replication
|
Cons
-
Complicated setup (requires Kafka, Zookeeper,
Kafka Connect, and Debezium)
-
Currently supports Postgres, MySQL, SQL Server,
DB2, Oracle, and MongoDB
-
Some older versions of the databases above
do not support CDC
- Requires extra setup in the source database
|
Method
Full Refresh
Always polls the entire dataset from the source.
Read
more
|
Pros
- The simplest to setup
- Can be quite fast for the relatively small
datasets
- Works for all data sources
|
Cons
- Not recommended for large datasets
|