Data Pipeline Testing
Data pipeline testing is the discipline of validating that your pipelines produce correct, complete, and trustworthy data โ before consumers see it. Unlike software unit tests (which validate code), data tests validate the data itself: row counts, null rates, schema, referential integrity, business rules, anomaly detection. dbt tests, Great Expectations, and Soda Core are the dominant frameworks. The hard truth: most data pipelines have between 0 and 5 tests in production, and most failures are detected by an angry executive seeing a wrong number on a dashboard. Engineering teams that ship 80% test-coverage code routinely ship 0% test-coverage data pipelines and act surprised when data quality is bad.
The Trap
The trap is testing only schema (column types and nullability) and calling it done. Schema tests catch about 20% of real-world data quality issues. The other 80% are logical: 'revenue is negative,' 'customer_id appears in 50% of rows when it should be 100%,' 'today's row count is 30% lower than yesterday's,' 'duplicate primary keys silently overwrote.' These require business-rule tests and freshness/volume anomaly tests. Teams that test only schema get a false sense of security and ship broken data confidently.
What to Do
Adopt a tiered testing strategy: (1) Schema tests on every model โ column types, nullability, primary key uniqueness. (2) Business rule tests on critical models โ revenue โฅ 0, valid status enums, foreign key integrity. (3) Freshness and volume tests โ alert when a daily pipeline produces zero rows or 50% fewer rows than the trailing 7-day average. (4) Data contract tests at producer-consumer boundaries. Run all tests as part of the pipeline; fail loudly. Define a policy for what fails the pipeline (block downstream) vs what alerts (continue but page someone).
Formula
In Practice
dbt (data build tool) ships with built-in tests for unique, not_null, accepted_values, and relationships. By 2024 it was the de facto SQL transformation framework with hundreds of thousands of users. Great Expectations (founded 2018) extended this with a richer expectation library โ column distributions, time-series anomalies, conditional expectations. The combination (dbt for transformation tests + Great Expectations for advanced data quality) is the canonical modern stack. Yet dbt's own community surveys consistently show median test coverage per project is 1-2 tests per model โ far below what catches real failures.
Pro Tips
- 01
Volume anomaly tests are the highest-ROI single test you can add. 'Today's row count is between 70% and 130% of the 7-day median' catches dropped sources, broken joins, and runaway duplicates. Add it to every fact table.
- 02
Test at the contract boundary, not just the destination. If team A produces a table consumed by teams B, C, D, the producer should run tests that fail their own pipeline if outputs violate contract. Catching downstream is too late.
- 03
dbt's Calogica wrote dbt-expectations, which ports Great Expectations checks into dbt syntax. For SQL-first teams, this gives you Great Expectations power without the Python infrastructure.
Myth vs Reality
Myth
โData tests are just like unit testsโ
Reality
Unit tests validate deterministic code with deterministic inputs. Data tests validate data that changes constantly with real-world variation. A data test that fires once a year on a real edge case is valuable. A unit test with that signal-to-noise ratio is broken. Different discipline, similar tooling.
Myth
โTesting slows pipelines down too muchโ
Reality
Tests typically add 5-15% to pipeline runtime. Compared to the cost of one wrong board-deck number, that's trivial. Teams that 'can't afford to add tests' usually can't afford the alternative โ they're already paying the cost in incidents, just not measuring it.
Try it
Run the numbers.
Pressure-test the concept against your own knowledge โ answer the challenge or try the live scenario.
Knowledge Check
Your data team ships 200 dbt models and has roughly 150 tests total โ almost entirely 'unique' and 'not_null' on primary keys. The CFO discovers a $2M revenue reporting error caused by a silently dropped customer source. What's the single highest-ROI test to add?
Industry benchmarks
Is your number good?
Calibrate against real-world tiers. Use these ranges as targets โ not absolutes.
Data Test Coverage (% of models with โฅ3 tests)
Data teams using dbt or similar transformation frameworksElite
> 80%
Good
50-80%
Average
20-50%
Underinvested
< 20%
Source: Hypothetical synthesis from dbt Community Surveys
Real-world cases
Companies that lived this.
Verified narratives with the numbers that prove (or break) the concept.
dbt Labs
2016-2026
dbt was created by Tristan Handy at RJMetrics (then Fishtown Analytics, now dbt Labs) to bring software engineering practices โ tests, version control, modularity โ to SQL-based data transformation. Built-in tests (unique, not_null, accepted_values, relationships) made data testing accessible by writing four lines of YAML. By 2024, dbt was the dominant transformation framework with hundreds of thousands of users. Yet community surveys consistently showed median test coverage per project was 1-2 tests per model โ far below what's needed to catch most failure modes.
Founded
~2016
Built-in Tests
4 (unique, not_null, accepted_values, relationships)
Median Tests/Model
1-2 (far too low)
Making testing easy is necessary but not sufficient. Most teams stop at 1-2 tests per model. The discipline of writing real business-rule tests requires cultural investment, not just tooling.
Great Expectations
2018-2026
Great Expectations (founded 2018 by Abe Gong and James Campbell) extended data testing beyond schema with a rich expectation library: column distributions, time-series anomalies, multi-column expectations, custom Python checks. It became the de facto Python-native data quality framework. Often paired with dbt (for SQL transformations) and Airflow/Dagster (for orchestration). The trio (dbt + Great Expectations + Dagster) is a canonical modern data quality stack.
Founded
2018
Expectations Available
300+
Common Pairing
dbt + Dagster + Great Expectations
Different tests need different tools. SQL-native checks belong in dbt; statistical and distributional checks need a richer framework like Great Expectations. Use the right tool for the right test.
Decision scenario
Building a Data Testing Discipline
You're VP Data at a 300-person company. Last quarter: 18 production data incidents, 4 visible to executives, 1 caused a $400K revenue mis-report. Your team of 12 has been resistant to writing tests because 'it slows us down.' You have one quarter to change this.
Quarterly Incidents
18
Executive-Visible
4
dbt Models
180
Test Coverage
~8%
Decision 1
You need to choose a strategy that ships visible improvement in one quarter.
Mandate 5+ tests on every model โ 100% coverage by quarter endReveal
Tier the models: identify the 30 'critical' models (feed exec dashboards or finance), require 5 high-value tests on each (volume anomaly, business rules, integrity), accept lower coverage on the restโ OptimalReveal
Related concepts
Keep connecting.
The concepts that orbit this one โ each one sharpens the others.
Beyond the concept
Turn Data Pipeline Testing into a live operating decision.
Use this concept as the framing layer, then move into a diagnostic if it maps directly to a current bottleneck.
Typical response time: 24h ยท No retainer required
Turn Data Pipeline Testing into a live operating decision.
Use Data Pipeline Testing as the framing layer, then move into diagnostics or advisory if this maps directly to a current business bottleneck.