Data Marketplace Strategy
A Data Marketplace is a platform — internal or external — where data products are listed, discovered, evaluated, and provisioned with minimal friction. External examples: Snowflake Data Cloud Marketplace (3,200+ live datasets), AWS Data Exchange, Databricks Marketplace with Delta Sharing. Internal examples: Uber's Databook, Lyft's Amundsen, Netflix's Metacat. Marketplace strategy answers four questions: (1) Are we a buyer, seller, or platform operator? (2) What's our curation model — open, curated, or certified-only? (3) How do we handle data contracts and SLAs across the catalog? (4) What's the discovery + provisioning UX? Marketplaces succeed only when curation discipline exceeds product breadth — 50 high-trust datasets with named owners always outperforms 5,000 unmaintained ones.
The Trap
The trap most enterprises fall into is treating the marketplace as a catalog problem instead of a curation problem. They buy a tool (Collibra, Alation, Atlan), index every dataset in the warehouse, and ship a 12,000-asset catalog that nobody trusts. Without aggressive curation — certified tier, deprecation policy, named stewards — the marketplace becomes a graveyard of stale views. The other trap is open-publishing: letting any team list a dataset without quality gates produces a long tail of broken pipelines that erode marketplace trust faster than any single bad dataset would. Snowflake Marketplace's success comes from listing review, not from being permissive.
What to Do
Stand up a marketplace in three waves: (1) Curated Core (months 1-3): hand-pick 20-50 highest-value datasets, establish certified tier with on-call owners, document SLAs (freshness, quality, schema). (2) Federated Listings (months 4-9): allow domain teams to list under 'community' tier with lower trust signals; require basic metadata + ownership. (3) Self-Service Provisioning (months 10+): integrate with access controls so users can request and consume data without ticketing. Measure: % of marketplace traffic going to certified vs community, weekly active users, time-to-first-query.
Formula
In Practice
Snowflake Data Cloud Marketplace launched in 2019 with strict provider vetting and grew to 3,200+ live datasets and 2,000+ providers by 2024 (Weather Source, FactSet, Foursquare, Knoema). Their core innovation: 'data sharing' eliminates copy-and-paste data exchange — consumers query provider data in place via Snowflake's compute, no ETL. This made the marketplace the default integration point for many enterprises. Snowflake refuses to list providers who can't meet freshness/quality bars — the curation discipline is what makes the catalog usable instead of a junk drawer.
Pro Tips
- 01
The single most predictive marketplace metric is 'time to trust decision' — how long does a new user spend evaluating whether a dataset is reliable before using it? Under 60 seconds = great UX; over 5 minutes = catalog is failing its purpose.
- 02
Always have a deprecation pipeline. Datasets without recent queries (90+ days zero usage) should be flagged, owners notified, and assets archived. Marketplaces without deprecation become cluttered within 18 months.
- 03
Tier your trust signals visibly: 'Certified' (SLA + on-call), 'Community' (owned but uncertified), 'Experimental' (no guarantees). Color-code aggressively. Users should never have to read documentation to know which tier they're using.
Myth vs Reality
Myth
“More datasets in the marketplace is better”
Reality
False at scale. Empirical data from large internal marketplaces (Uber, Airbnb, LinkedIn) shows that as catalog size grows past ~1,000 assets, time-to-discovery degrades exponentially and trust collapses. The optimal mid-large enterprise catalog is 200-500 curated assets, not 10,000 indexed ones. Breadth without curation destroys utility.
Myth
“External and internal marketplaces are different beasts”
Reality
Largely false. The economic and trust mechanics are identical: producers list, consumers discover, platform curates, trust signals drive adoption. The internal version skips money but still has reputation and political 'currency.' Treating internal data marketplaces with marketplace economics rigor — not as IT catalogs — improves outcomes substantially.
Try it
Run the numbers.
Pressure-test the concept against your own knowledge — answer the challenge or try the live scenario.
Knowledge Check
Your enterprise marketplace has 4,800 listed datasets after 18 months. Adoption is flat at 12% weekly active users. What's the highest-leverage intervention?
Industry benchmarks
Is your number good?
Calibrate against real-world tiers. Use these ranges as targets — not absolutes.
Internal Marketplace Adoption (% of data users active monthly)
Internal data marketplaces (Atlan, Alation, Collibra, in-house)Elite (Airbnb-level)
> 70%
Strong
40-70%
Average
20-40%
Underutilized
10-20%
Failed Catalog
< 10%
Source: Atlan State of Data Discovery 2024 / DataKitchen Benchmarks
Real-world cases
Companies that lived this.
Verified narratives with the numbers that prove (or break) the concept.
Snowflake Data Cloud Marketplace
2019-Present
Snowflake launched Marketplace in 2019 with a deliberate curation-first approach: providers must meet listing standards (freshness, schema documentation, contact SLA). Combined with Snowflake's data sharing technology — consumers query provider data in place without copy/paste ETL — the marketplace became the default external data integration for many Snowflake enterprises. Grew to 3,200+ datasets, 2,000+ providers by 2024. Crucially, Snowflake declined to compete on listing breadth with AWS Data Exchange; they competed on consumption UX (zero-ETL) and curation, which won the high-value enterprise segment.
Live Datasets (2024)
3,200+
Active Providers
2,000+
Year Launched
2019
Differentiator
Zero-ETL data sharing
Marketplace differentiation comes from consumption UX and curation, not from listing volume. Snowflake bet that quality and frictionless access would beat AWS Data Exchange on breadth — and won the enterprise segment.
Databricks Marketplace + Delta Sharing
2022-Present
Databricks launched its Marketplace in 2022 built on Delta Sharing — an open protocol for sharing data across platforms (not just Databricks-to-Databricks). This addressed the lock-in concern that limited Snowflake Marketplace adoption among multi-cloud enterprises. Combined with marketplace listings for AI models and notebooks (not just datasets), Databricks differentiated by widening the SKU. By 2024, Marketplace listings included models from Hugging Face partners, dbt models, and analytics templates, making it a general-purpose AI/data exchange instead of a pure data catalog.
Year Launched
2022
Underlying Protocol
Delta Sharing (open)
Listing Types
Datasets, AI models, notebooks
Cross-Platform
Yes (multi-cloud)
Open protocols (Delta Sharing) addressed the lock-in objection that constrained Snowflake's reach. Widening listing types beyond raw data positioned Databricks as the AI exchange, not just a data exchange.
Decision scenario
The Internal Marketplace Re-Launch
You inherited an internal data catalog with 6,200 indexed datasets, 14% monthly active users, and a Slack channel full of 'is this dataset reliable?' questions. Your CTO wants you to 're-launch' the marketplace in 6 months with a clear adoption goal.
Indexed Datasets
6,200
Monthly Active Users
14%
Trust Complaints (Slack/quarter)
~120
Certified Datasets
0 (no tier exists)
Decision 1
You can either expand catalog (add 2,000 more datasets from new sources) or contract aggressively (deprecate 4,000+ stale assets, certify 100, rebuild trust UX).
Expand: more datasets means more users will find what they need. Add the 2,000 new datasets and improve search.Reveal
Contract: deprecate 4,000 stale assets, certify the top 100 most-used with named owners and SLAs, rebuild UX so default search shows certified-only.✓ OptimalReveal
Related concepts
Keep connecting.
The concepts that orbit this one — each one sharpens the others.
Beyond the concept
Turn Data Marketplace Strategy into a live operating decision.
Use this concept as the framing layer, then move into a diagnostic if it maps directly to a current bottleneck.
Typical response time: 24h · No retainer required
Turn Data Marketplace Strategy into a live operating decision.
Use Data Marketplace Strategy as the framing layer, then move into diagnostics or advisory if this maps directly to a current business bottleneck.