Google BigQuery ETL

Get data into BigQuery. Get insights out.

Q: How does Etlworks load data into BigQuery?

Etlworks stages files in Google Cloud Storage and runs BigQuery load jobs for high-throughput bulk loading — the pattern Google recommends — rather than row-by-row inserts. CSV, JSON, Parquet, and Avro are supported. Staples Canada improved data integration performance 10x using Etlworks's BigQuery connector with bulk load.

Q: Does Etlworks support real-time CDC into BigQuery?

Yes. Log-based CDC from MySQL, Postgres, SQL Server, Oracle, MongoDB, and DB2 streams change events into BigQuery. Changes are applied with MERGE so INSERT, UPDATE, and DELETE are preserved idempotently, with sub-second source latency.

Q: Does Etlworks support partitioned and clustered BigQuery tables?

Yes. Etlworks loads into partitioned and clustered tables and can manage partitioning automatically, which keeps query cost and scan volume down. Schema evolution — new columns and type changes — propagates automatically.

Q: How does Etlworks authenticate to BigQuery?

With a Google Cloud service account (JSON key) or OAuth, scoped to the BigQuery dataset and the GCS staging bucket. The connection works across projects and regions.

Q: Can I use dbt with Etlworks and BigQuery?

Yes. Etlworks loads raw data into BigQuery, then you can trigger a dbt run to model it — or use Etlworks's native SQL, JavaScript, and Python transformations and skip dbt. ELT pushdown is supported, so transformations can run inside BigQuery.

Q: Will Etlworks drive up my BigQuery costs?

Etlworks's loading patterns are designed to be efficient: GCS-staged load jobs instead of streaming where it isn't needed, batched MERGE during CDC, and partitioned/clustered targets that reduce scanned bytes on downstream queries. Because Etlworks bills per platform tier — not per row — your integration cost stays flat as data volumes grow.

ETL, ELT, real-time CDC, and Reverse ETL — all into and out of Google BigQuery. GCS-staged load jobs, MERGE-based change data, partitioning and clustering, 270+ source connectors. Predictable monthly pricing instead of per-row consumption.

Start free trial Talk to us

10x: Load perf (Staples Canada)
2-way: In + Reverse ETL
<1s: CDC latency
No: Per-row billing

The problem

BigQuery bills for bytes scanned. Your ETL bill shouldn't pile on.

Most ways into BigQuery cost you twice. Consumption-priced ETL tools bill per row on top of your on-demand or slot spend, or you wire up Cloud Functions and scheduled queries that someone has to own. Either way, loading the warehouse becomes its own project — and naive loads inflate the bytes every downstream query scans.

Where budgets go to die

Per-row pricing punishes the workloads that justify the warehouse.

CDC pipelines, hourly SaaS syncs, large historical backfills — exactly the data you want in BigQuery — are the workloads consumption-priced tools cost the most for. Etlworks bills per platform tier, not per record, and loads into partitioned, clustered tables so downstream queries scan less. Predictable for your CFO, painless for your data team.

Capabilities

BigQuery-native, end to end.

Bulk loading via GCS

Stages files in Google Cloud Storage, then runs BigQuery load jobs at warehouse speed — the high-throughput path, not streaming inserts.

Real-time CDC into BigQuery

Log-based CDC from MySQL, Postgres, SQL Server, Oracle, Mongo, DB2 — sub-second latency, MERGE-based deduping into partitioned tables.

Reverse ETL out of BigQuery

Push modeled BigQuery data to Salesforce, HubSpot, Marketo, NetSuite, and 200+ SaaS targets. Same platform, same subscription.

Partitioning & clustering

Load into partitioned, clustered tables — managed automatically — to cut scanned bytes and query cost. Schema evolution propagates without DDL drift.

Transformations + ELT pushdown

SQL, JavaScript, Python — transform in flight or push down to run inside BigQuery. dbt-friendly, dbt-optional.

Cost-aware loading

GCS-staged load jobs, batched MERGE, and partition-aware writes keep both load throughput high and downstream scan cost low.

Patterns

Three flows, one platform.

Every BigQuery data pipeline pattern, configured the same way. No separate tool for CDC, no separate tool for Reverse ETL, no Cloud Functions to maintain by hand.

Data in

Source GCS BigQuery

Bulk ETL / ELT

Stage files in GCS, then run a BigQuery load job. The pattern Google recommends, automated end to end.

Real-time

DB Log CDC MERGE

CDC into BigQuery

Log-based CDC streams change events into BigQuery via MERGE. Sub-second latency, no Kafka.

Reverse

BigQuery Transform SaaS

Reverse ETL out

Push enriched data from BigQuery to Salesforce, HubSpot, Marketo, NetSuite — 200+ SaaS targets.

Pricing transparency

A typical 50M-row pipeline, three ways.

Same workload — Salesforce account changes, Postgres orders, hourly SaaS syncs into BigQuery — priced under three common ETL pricing models. Numbers are approximate, based on public pricing as of 2026, and exclude BigQuery compute itself.

Consumption (per-row)

~$8,000/mo

Scales linearly with row volume. Hidden surge pricing during busy months.

Credit-based

~$3,500/mo

Better, but credits expire, and peak-load tier upgrades add cost.

Etlworks (fixed tier)

$1,000/mo

Standard tier, all features, all rows. Predictable for budgets, painless for data teams.

Specifications

BigQuery integration depth.

Every part of a BigQuery pipeline you'd actually run — loading, CDC, table design, and security — supported and documented.

Staging

Google Cloud Storage · auto-managed lifecycle

Load method

BigQuery load jobs for bulk · MERGE for change data

File formats

CSV, JSON, Parquet, Avro

Tables & transforms

Partitioning & clustering

Partitioned and clustered tables · automatic partition management

CDC into BigQuery

MERGE deduping · INSERT/UPDATE/DELETE preserved · idempotent

ELT & transforms

Pushdown SQL generated for BigQuery · in-flight SQL, JavaScript, Python · dbt-friendly

Security & auth

Authentication

Service account (JSON key) or OAuth · scoped to dataset and GCS bucket

Scope

Cross-project and cross-region datasets

Network

TLS in transit · static IP allowlisting for on-prem agents

Comparing BigQuery ETL tools? See Etlworks vs Fivetran, Matillion, and Airbyte

Proof

BigQuery pipelines, in production.

Staples Canada integrates data from SQL Server and file-based sources into Google BigQuery, and improved data integration performance 10x by using Etlworks's BigQuery connector with bulk load.

Staples Canada

10x load performance · SQL Server & files → BigQuery

Read the case study

FAQ

Common questions.

How does Etlworks load data into BigQuery?

Etlworks stages files in Google Cloud Storage and runs BigQuery load jobs for high-throughput bulk loading — the pattern Google recommends — rather than row-by-row inserts. CSV, JSON, Parquet, and Avro are supported. Staples Canada improved data integration performance 10x using Etlworks's BigQuery connector with bulk load.

Does Etlworks support real-time CDC into BigQuery?

Yes. Log-based CDC from MySQL, Postgres, SQL Server, Oracle, MongoDB, and DB2 streams change events into BigQuery. Changes are applied with MERGE so INSERT, UPDATE, and DELETE are preserved idempotently, with sub-second source latency.

Does Etlworks support partitioned and clustered BigQuery tables?

Yes. Etlworks loads into partitioned and clustered tables and can manage partitioning automatically, which keeps query cost and scan volume down. Schema evolution — new columns and type changes — propagates automatically.

How does Etlworks authenticate to BigQuery?

With a Google Cloud service account (JSON key) or OAuth, scoped to the BigQuery dataset and the GCS staging bucket. The connection works across projects and regions.

Can I use dbt with Etlworks and BigQuery?

Yes. Etlworks loads raw data into BigQuery, then you can trigger a dbt run to model it — or use Etlworks's native SQL, JavaScript, and Python transformations and skip dbt. ELT pushdown is supported, so transformations can run inside BigQuery.

Will Etlworks drive up my BigQuery costs?

Etlworks's loading patterns are designed to be efficient: GCS-staged load jobs instead of streaming where it isn't needed, batched MERGE during CDC, and partitioned/clustered targets that reduce scanned bytes on downstream queries. Because Etlworks bills per platform tier — not per row — your integration cost stays flat as data volumes grow. Talk to us for a cost comparison.

Start your trial

14 days. No card. Real workloads.

Spin up a free trial, point it at your BigQuery dataset, and load production data. See what predictable ETL pricing actually feels like.

Start free trial Talk to us