Data StrategyIntermediate7 min read

Data Product Discovery

Data Product Discovery is the structured process of finding, validating, and prioritizing the data assets your organization (or the market) will pay for or rely on as products. It treats datasets, dashboards, models, and APIs the way PMs treat software: who is the user, what job are they hiring it for, what willingness-to-pay (or willingness-to-rely) exists, and what's the smallest version that proves it. Discovery starts before pipelines are built — interviews, log mining of existing reports, and shadowing analysts uncover the 5-10 'evergreen questions' that get re-asked weekly. Those questions become candidate data products. Without discovery, data teams build 200 dashboards that nobody opens; with it, they build 12 that drive decisions.

Also known asData Product ResearchData Use Case DiscoveryData JTBDData Opportunity Sizing

Challenge a friend Browse library

The Trap

The trap is letting engineers pick what to build because 'the data is already there.' Convenience-driven roadmaps produce technically clean datasets that solve nobody's problem. The other trap is over-indexing on executive requests — execs ask for what they think they want (a unified KPI dashboard) when the actual blocker is a clean customer ID. Discovery requires saying no to 80% of requests, which feels political. Teams without an explicit prioritization rubric default to loudest-voice-wins, which is how you end up rebuilding the same revenue dashboard four times for four VPs.

What to Do

Run a 4-week discovery sprint before any new data platform investment: (1) Interview 15-20 downstream users — analysts, ops managers, sales — about decisions they make weekly and where data fails them. (2) Log-mine your BI tool: which dashboards have >50 weekly views and which have zero? (3) Score candidates on a 2x2: business value (revenue impact, decision frequency) vs feasibility (data exists, quality acceptable). (4) Pick 3-5 to build as v1 data products with named owners, SLAs, and explicit consumers. Kill everything else from the backlog publicly.

Formula

Data Product Score = (Decision Frequency × Decision Value × Number of Users) ÷ (Build Cost + Maintenance Cost)

In Practice

Airbnb's data team in 2017 ran a discovery exercise that revealed 80% of their 500+ internal dashboards had under 5 weekly users, while 10 dashboards drove 70% of decisions. They retired 400+ dashboards, formalized the top 10 as 'certified data products' with on-call ownership, and built the Dataportal tool to make discovery searchable. Decision velocity measurably improved and on-call data incidents dropped 50%, because the team stopped supporting noise.

Pro Tips

01
The 'evergreen question' test: if the same question gets asked in Slack 3+ times by different people in 30 days, it's a candidate data product. Search your Slack for 'can someone pull' to find them.
02
Always interview the analysts who manually answer recurring questions — they know exactly which question patterns recur and which datasets are unreliable. They are your richest discovery source, richer than executives.
03
Discovery is never 'done.' Run a lightweight discovery cycle every quarter. Business priorities change faster than data platforms, and last quarter's evergreen question may now be stale.

Myth vs Reality

Myth

“Discovery is just gathering requirements from stakeholders”

Reality

Requirements gathering captures stated needs. Discovery captures revealed needs — what people actually do, what they Slack each other for at 11pm, what manual workarounds they've built. Stated and revealed needs diverge wildly. Discovery without observing actual workflows produces feature lists, not products.

Myth

“If we build it, they will come”

Reality

False, expensively. Studies of internal data platforms show 60-70% of built datasets have under 10 monthly active users. Adoption requires distribution: embedding in workflows, training, evangelism, deprecation of competing sources. Build-only strategies waste 40%+ of data engineering capacity.

Try it

Run the numbers.

Pressure-test the concept against your own knowledge — answer the challenge or try the live scenario.

🧪

Knowledge Check

Your data team has capacity to build 5 new data products this quarter. You have 47 stakeholder requests. The CFO has personally requested 8 of them. What's the best prioritization approach?

Industry benchmarks

Is your number good?

Calibrate against real-world tiers. Use these ranges as targets — not absolutes.

Data Product Adoption (60 days post-launch)

Internal data products at mid-to-large enterprises

Elite

> 70% of target users active weekly

Good

40-70%

Average

20-40%

Poor

5-20%

Failed Discovery

< 5%

Source: ThoughtSpot State of Analytics 2024 / DataKitchen Data Ops Benchmarks

Real-world cases

Companies that lived this.

Verified narratives with the numbers that prove (or break) the concept.

🏠

Airbnb (Dataportal)

2017-2019

success

Airbnb's data platform team discovered that out of 500+ active dashboards, only 10 drove 70% of decisions. They built Dataportal — an internal data discovery tool — to surface high-trust assets and deprecate low-value ones. The exercise was as much about killing data products as launching them. Top dashboards were certified, given on-call owners, and indexed for search. Adoption of the certified set jumped, while ad-hoc data requests fell.

Dashboards Audited

500+

Dashboards Driving 70% of Decisions

Dashboards Retired

400+

On-call Data Incidents

Down ~50%

Discovery is as much about deprecation as creation. The most underrated data product roadmap action is killing things nobody uses but everyone is afraid to delete.

Source ↗

🛡️

Hypothetical: Mid-Market Insurance Carrier

2024

pivot

A regional insurance carrier hired a 12-person data team and built 80 dashboards in year one. A discovery audit revealed 62 had fewer than 5 weekly users. The root cause: requirements were gathered from VP-level executives in roadmap meetings, but the actual users — claims adjusters and underwriters — were never interviewed. After a discovery reset focused on adjuster workflows, the team built 6 high-impact tools (adjuster workload balancer, fraud signal alerts) that hit 80%+ adoption. The other 62 were retired.

Dashboards Built (Year 1)

With < 5 Weekly Users

62 (78%)

Post-Discovery Tools Built

Adoption Rate (New Tools)

80%+

Executives request ideas; users live the workflows. Discovery that skips frontline users produces dashboards optimized for steering committees, not for the people whose decisions actually create value.

Decision scenario

The Quarterly Roadmap Pitch

You're the Head of Data at a 600-person fintech. Backlog has 53 dashboard/dataset requests. The CRO wants 'a unified pipeline view' (3 weeks of work). The CFO wants 'real-time cash position' (6 weeks). Frontline ops are quietly asking for a 'why did this transaction fail' tool. Your team can ship 4 things this quarter.

Backlog Size

53 requests

Quarterly Capacity

4 data products

Active Stakeholders

12 VPs + 200 ops users

Last Quarter's Adoption

31% avg (poor)

Decision 1

You can either build what the C-suite asked for (politically safe) or run a 2-week discovery sprint before committing. Discovery costs 25% of the quarter's capacity but reframes the roadmap.

Skip discovery, ship the CRO + CFO requests + 2 more from the backlog. Move fast.Reveal

You ship 4 dashboards on time. The CRO opens the pipeline view 3 times in 90 days. The CFO uses the cash dashboard daily for 2 weeks then reverts to her old spreadsheet because the refresh cadence doesn't match her workflow. Adoption averages 28%. Next quarter, the same backlog is bigger and the CEO asks why data ROI is unclear.

Adoption Rate: 31% → 28%Backlog Size: 53 → 67

Run a 2-week discovery sprint: shadow the ops team, log-mine the BI tool, interview the people who'd actually use the CRO/CFO requests. Re-prioritize publicly.Reveal

Discovery reveals: the CRO's pipeline view is duplicative of an existing tool nobody uses (training problem, not data problem). The CFO actually needs hourly bank-account refresh, not a dashboard. Ops' 'why did transactions fail' tool would serve 180 weekly users. You re-pitch: build the failure-diagnosis tool, fix the bank refresh, defer the pipeline view, retire 11 backlog items as duplicates. Adoption hits 72%. The CRO is annoyed for 3 weeks, then becomes your loudest advocate.

Adoption Rate: 31% → 72%Backlog Size: 53 → 28 (after deprecations)

Related concepts