Data StrategyIntermediate6 min read

Data Quality Scorecard

A Data Quality Scorecard is a continuously measured composite score across the six classical dimensions of data quality: (1) Completeness — % of required fields populated, (2) Accuracy — % matching ground truth, (3) Consistency — % matching across systems, (4) Timeliness — % delivered within SLA, (5) Validity — % conforming to format/business rules, (6) Uniqueness — % free of duplicates. Each dataset gets a score per dimension, a composite Data Health Score, and a trend line. The scorecard converts the abstract problem 'is our data good?' into a number that can be SLA-ed, owned, and improved. Without a scorecard, data quality is a feeling; with one, it becomes a managed asset.

Also known asDQ ScorecardData Quality IndexDQIData Health ScoreData Trust Score

Challenge a friend Browse library

The Trap

The trap is producing a beautiful dashboard nobody acts on. Most data quality scorecards measure everything (200 datasets × 6 dimensions = 1,200 scores) and surface red boxes the data team can't fix because they don't own the source systems. The scorecard becomes a complaint mechanism: 'Sales operations data is 47% complete' — but Sales Ops doesn't see the scorecard or get budget to fix it. The other trap is averaging dimensions into a single score that hides the real problem: 95% accuracy on a 0% timely dataset is useless for real-time decisions. Always show the dimensions separately AND the composite.

What to Do

Build a tiered scorecard. Tier 1: ~10 critical datasets (revenue, customer, product, financial close) — score weekly, with a named business owner who is on the hook for the score. Tier 2: ~50 important datasets — score monthly. Tier 3: rest — score quarterly with passive monitoring. For each Tier 1 dataset, set explicit SLAs per dimension (e.g., 'customer master 99% completeness, 95% accuracy, 24-hour timeliness') and tie owner bonuses to the score. Anything else is theater.

Formula

Composite Data Health Score = Weighted Average(Completeness, Accuracy, Consistency, Timeliness, Validity, Uniqueness). Typical weights: Accuracy 30%, Completeness 20%, Timeliness 20%, Consistency 15%, Validity 10%, Uniqueness 5%. Tune by use case.

In Practice

Airbnb's data quality system, 'Wall' (publicly described in their engineering blog), monitors thousands of internal datasets against quality rules and produces tiered health scores. They built it after a 2017 incident where bad upstream data caused incorrect search ranking for ~24 hours, costing material bookings. Each tiered dataset has a named owner who gets paged when SLAs break. Wall publishes both per-dataset scores and trends, and the data platform team's OKRs include reducing the number of tier-1 SLA breaches. The system is credited with restoring engineering trust in centralized data after a period of widespread use of side-channel pipelines.

Pro Tips

01
Tie data quality to a downstream business KPI to make it real. 'Customer master completeness improved from 78% to 94% → marketing email bounce rate dropped from 9% to 2.3% → est. $1.2M annual savings'. Quality scores without business outcomes don't get sustained funding.
02
The most-impactful quality dimension is usually Timeliness. A 99% accurate metric delivered 3 days late is useless for operational decisions. Many enterprises over-invest in accuracy and under-invest in freshness. Score timeliness aggressively.
03
Make broken-data the source system owner's problem, not the data team's. If Salesforce data is 30% incomplete, the fix lives with Sales Operations and the CRO, not the data engineers. Scorecards need an escalation path that puts business pressure on the right team.

Myth vs Reality

Myth

“We need 100% data quality”

Reality

100% is impossible and expensive to chase. The right target depends on use case: financial reconciliation needs 99.9%; product analytics is fine at 95%; ML feature stores often work at 90% with appropriate handling. Set quality SLAs per-dataset, per-use-case. Universal '100% quality' policies waste budget and miss the dimensions that actually matter.

Myth

“Data quality tools (Monte Carlo, Soda, Great Expectations) solve data quality”

Reality

Tools detect issues; they don't fix them. The fix is always a human/process intervention upstream. Companies that buy a DQ tool without changing source-system ownership and remediation processes get the same broken data, just with better alerts. The tool is necessary but not sufficient — the operating model is the actual fix.

Try it

Run the numbers.

Pressure-test the concept against your own knowledge — answer the challenge or try the live scenario.

🧪

Knowledge Check

Your customer dataset scores: Completeness 95%, Accuracy 92%, Consistency 88%, Timeliness 65%, Validity 96%, Uniqueness 91%. The marketing team uses this dataset for triggered emails (real-time campaigns). What is the most important dimension to fix first?

Industry benchmarks

Is your number good?

Calibrate against real-world tiers. Use these ranges as targets — not absolutes.

Composite Data Health Score by Tier

Mid-to-large enterprises with mature data governance

Tier-1 Critical (financial, customer master)

Target ≥ 98%

Tier-2 Important (analytics, ML features)

Target ≥ 92%

Tier-3 Operational (logs, exploratory)

Target ≥ 80%

Below Tier-3 SLA

Investigation triggered

Crisis

< 70% on tier-1

Source: https://www.dama.org/cpages/dama-dmbok2 (DAMA-DMBOK Data Quality Framework)

Real-world cases

Companies that lived this.

Verified narratives with the numbers that prove (or break) the concept.

🏠

Airbnb

2017-present

success

Airbnb built 'Wall' — an internal data quality monitoring system — after a 2017 incident where stale upstream data caused incorrect search ranking for ~24 hours, materially impacting bookings. Wall scores thousands of datasets across completeness, freshness, and consistency dimensions, with explicit SLAs per dataset and named owners who get paged on breach. The data platform team's OKRs include reducing tier-1 SLA breaches. Wall is widely credited internally with restoring trust in centralized data after years of teams running side-channel pipelines because they didn't trust the warehouse.

Datasets Monitored

Thousands

Per-Dataset SLAs

Defined for tier-1/2

Trigger Event

2017 search ranking incident

Outcome

Restored centralized data trust

Quality scorecards become powerful when tied to named ownership, paging on breach, and platform-team OKRs. Without those, they are wallpaper.

Source ↗

🚗

Uber

2018-present

success

Uber built UDQ (Uber Data Quality) — a unified DQ platform monitoring 100,000+ datasets in their data lake. UDQ scores datasets continuously, surfaces SLA breaches, and gates downstream ML model retraining on quality thresholds. Models that depend on tier-1 features won't auto-retrain if the underlying data fails quality checks — preventing silent ML drift caused by upstream data degradation. The system was developed because earlier ML incidents at Uber were traced to data quality regressions that nobody noticed until business metrics moved.

Datasets Monitored

100,000+

ML Models Gated by DQ

Hundreds

Coverage

All tier-1 production data

Driver of Build

Silent ML drift incidents

At ML scale, the scorecard is not just for humans — it gates production model behavior. Quality SLA breach = halt downstream computation, not just send an email.

Source ↗

Related concepts