Data Cost Optimization
Data Cost Optimization is the discipline of running the warehouse, lakehouse, and surrounding data tooling at the right cost โ not the lowest cost, but the cost that matches the value the data creates. It's FinOps applied to the data stack: Snowflake credits, BigQuery slot/on-demand spend, Databricks DBUs, S3 storage, ETL/CDC license fees, BI tool seats. The honest test is whether the data team can answer two questions in under an hour: 'What did we spend on data infrastructure last month, broken down by team, pipeline, and dashboard?' and 'Which of our top 10 most expensive queries are actually creating proportional business value?' Most companies cannot. The cost of data infrastructure compounds silently โ most companies discover their Snowflake bill is 3x what it needed to be only after a CFO rage-tweets about it. Optimization is the discipline of catching that compounding before it becomes a board-level event.
The Trap
The trap is treating data cost as a one-time clean-up project ('we did a Snowflake optimization last year, we're good') when it's actually a continuous practice โ every new pipeline, every new dashboard, every new analyst with autocomplete in their SQL editor adds compounding spend. The other trap is over-optimizing on the wrong axis โ saving $50K of warehouse spend by killing a query that drives a $500K business decision. The KnowMBA POV: most companies don't have a data cost problem; they have a data cost VISIBILITY problem. They can't tell which dashboard, team, or pipeline is responsible for which dollars. Once visibility exists, the optimizations are usually obvious โ kill the orphaned dashboards (40% of dashboards on average have no viewer in the last 90 days), right-size the auto-suspend (most warehouses are oversized 2-4x), partition the giant tables, and switch dev workloads to smaller compute. These are not exotic techniques; they're discipline that requires visibility to execute.
What to Do
Build cost as a first-class operational metric. Step 1: instrument cost attribution โ every query, dashboard, pipeline, and team needs a tag that flows into your cost reporting. Snowflake QUERY_HISTORY, BigQuery INFORMATION_SCHEMA.JOBS_BY_PROJECT, Databricks system tables all expose this. Step 2: publish a weekly cost dashboard broken by team, pipeline, dashboard, and top queries. Step 3: identify the 80/20 โ typically the top 50 queries drive 60-80% of warehouse spend. Step 4: act on the obvious โ kill orphaned dashboards (zero views in 90 days), right-size warehouses (most XL warehouses can be M with longer auto-suspend), pre-aggregate the most expensive recurring queries, materialize what should be materialized. Step 5: implement guardrails โ cost budgets per team with alerts at 80% of budget, query timeouts, materialized view governance. Step 6: review monthly with team leads โ make cost a shared metric, not a central data team problem.
Formula
In Practice
BigQuery and Snowflake both publish extensive cost optimization guides because their customers consistently overspend by 30-60% before applying basic optimization. Snowflake's published patterns: right-sizing warehouses, aggressive auto-suspend (60 seconds vs default), separating workloads by warehouse, partitioning large tables, killing zero-view dashboards. BigQuery's patterns: switching from on-demand pricing to flat-rate slots once spend stabilizes, partitioning and clustering, materialized views, BI Engine for hot dashboards. Public case studies (DoorDash, Instacart, Coinbase, Pinterest, Wise) repeatedly show 30-60% cost reductions with no business value lost โ purely from visibility plus discipline. Wise published a 2022 engineering blog detailing how cost attribution and per-team budgets reduced their data platform spend ~40% while maintaining the same data product quality. The pattern is so consistent across companies that 'we don't have a cost problem' is almost certainly wrong if you haven't measured.
Pro Tips
- 01
Set warehouse auto-suspend to 60 seconds (not the 10-minute default). On Snowflake, this single change typically cuts warehouse spend 15-30% for spiky workloads with no impact on user experience. The 10-minute default exists for the vendor's revenue, not your benefit.
- 02
Audit dashboard usage quarterly. Across most BI tools, 30-50% of dashboards have zero views in the trailing 90 days. Each one runs scheduled refreshes consuming warehouse credits for nobody. Killing orphaned dashboards is the highest-ROI cost optimization most teams haven't done.
- 03
Separate dev/staging from production warehouses with strict size differences. Most companies run dev queries on the same XL warehouse as production, multiplying spend. A small dev warehouse + production XL with workload separation typically cuts 20-30% with zero analyst friction once they get used to it.
Myth vs Reality
Myth
โModern cloud warehouses are cheap; cost optimization isn't worth the engineering timeโ
Reality
Snowflake, BigQuery, and Databricks bills regularly hit 7-8 figures at mid-to-large enterprises within 2-3 years of adoption. The 'cheap per query' story is true at small scale and aggressively wrong at the scale where most enterprise data spend lives. The vendors are great at making the first $100K/year easy and the next $5M/year a board-level conversation.
Myth
โReserved capacity / committed-use contracts solve cost optimizationโ
Reality
Commitments lock in a baseline at a discount but do nothing about wasteful queries, oversized warehouses, or orphaned dashboards. Buying $2M of committed Snowflake capacity to spend $2.5M instead of $3M is better than not buying it โ but vastly worse than buying $1M of capacity to spend $1.2M after the optimizations that should have happened first. Optimization should precede commitment, not replace it.
Try it
Run the numbers.
Pressure-test the concept against your own knowledge โ answer the challenge or try the live scenario.
Knowledge Check
A 600-person company's Snowflake bill jumped from $400K/year to $1.4M over 24 months with no proportional growth in users or data volume. The CFO has demanded an explanation. The data team has no per-team or per-dashboard cost attribution. What is the right first step?
Industry benchmarks
Is your number good?
Calibrate against real-world tiers. Use these ranges as targets โ not absolutes.
Typical Cloud Warehouse Optimization Savings (after visibility + standard playbook)
Published case studies (Wise, DoorDash, Instacart, Coinbase) on Snowflake/BigQuery cost optimizationAggressive optimization possible
40-60% reduction
Standard optimization
25-40% reduction
Already partially optimized
10-25% reduction
Mature FinOps practice
< 10% additional reduction
Source: https://docs.snowflake.com/en/user-guide/cost-controlling-spending
Real-world cases
Companies that lived this.
Verified narratives with the numbers that prove (or break) the concept.
Wise (formerly TransferWise)
2021-2022
Wise published an engineering blog detailing how their data platform team reduced spend ~40% over 12 months by implementing per-team cost attribution, killing orphaned dashboards, right-sizing warehouses, and adopting per-team cost budgets with weekly review. The decisive enabler was attribution โ once cost was visible at the team and dashboard level, the optimizations were obvious and largely self-driven by team leads who didn't want to be on the high-spend list. The platform team's role shifted from 'gatekeeper' to 'visibility provider'. The pattern (attribution โ visibility โ distributed action) generalizes well beyond Wise.
Annual Spend Reduction
~40%
Primary Mechanism
Per-team attribution + budgets
Implementation Time
~12 months
Business Value Lost
None reported
Cost optimization is a visibility problem first, an engineering problem second. Once teams can see their own spend, distributed optimization happens almost automatically.
Snowflake (published cost optimization guide)
2020-present
Snowflake publishes extensive cost optimization documentation precisely because their customers consistently overspend by 30-60% before applying basic discipline. Their published recommendations: aggressive auto-suspend (60 seconds vs default), right-sizing warehouses, separating workloads by warehouse, query tagging for attribution, materialized views for recurring expensive queries, and cost monitors with budget alerts. The guidance is genuinely good โ Snowflake's incentive on optimization is mixed (less spend means less revenue) but they recognize that customer cost surprises drive churn.
Typical Pre-Optimization Overspend
30-60%
Standard Playbook Items
8-12 documented patterns
Most Impactful Single Change
60-second auto-suspend
Customer Churn Driver Snowflake Cites
Cost surprise
Read the vendor's own cost optimization guide first. They publish it precisely because they know their customers are wasting money in predictable, well-documented ways.
Hypothetical: B2B SaaS
2022-2023
A 800-person B2B SaaS saw Snowflake spend climb from $600K/year to $2.8M over 30 months โ driven by no single decision, just the silent compounding of new dashboards, larger warehouses, and dev workloads creeping onto prod compute. The CFO demanded answers in late 2023. The data team had no per-team attribution and could not explain where the spend went. After a panic 8-week optimization sprint (attribution, dashboard cleanup, right-sizing, dev/prod separation), spend dropped to $1.7M โ a 39% reduction with no business value lost. The post-mortem identified that distributed cost ownership and quarterly reviews would have prevented the entire trajectory; the team was managing data, not data cost.
Spend Trajectory
$600K โ $2.8M over 30 months
Post-Optimization Spend
$1.7M (-39%)
Time to Optimize
8 weeks (panic mode)
Business Value Lost
None
Data cost compounds silently because no one is responsible for it as a continuous metric. The CFO eventually notices; the question is whether you've built the visibility before that conversation.
Decision scenario
The Snowflake Bill Conversation
You're VP of Data at a 700-person SaaS company. Your Snowflake bill has climbed from $500K/year to $2.2M over 24 months. The CFO has just sent a message: 'Need a plan to cut data infrastructure spend 30% by next quarter without breaking the business.' The CTO suggests buying a 3-year committed-use contract for a 25% discount on per-credit rates. Your data engineering lead suggests a 6-week optimization sprint. The CFO wants results in 90 days.
Current Annual Spend
$2.2M
Cost Attribution Coverage
0% (no tags, no per-team breakdown)
Dashboards with Zero Views (90d)
Estimated 40% (unverified)
Warehouse Auto-Suspend
10 minutes (default)
CFO-Imposed Deadline
90 days
Decision 1
You can either commit immediately for the discount, run the optimization sprint first, or do both in parallel. The 90-day clock is real.
Sign the 3-year committed-use contract for 25% per-credit discount on $2.2M baseline. Defer optimization to later.Reveal
Run a 6-week optimization sprint first: implement cost attribution (week 1-2), kill orphaned dashboards (week 3), right-size warehouses + 60-sec auto-suspend (week 4), separate dev/prod (week 5-6). Then commit at the new optimized run-rate.โ OptimalReveal
Related concepts
Keep connecting.
The concepts that orbit this one โ each one sharpens the others.
Beyond the concept
Turn Data Cost Optimization into a live operating decision.
Use this concept as the framing layer, then move into a diagnostic if it maps directly to a current bottleneck.
Typical response time: 24h ยท No retainer required
Turn Data Cost Optimization into a live operating decision.
Use Data Cost Optimization as the framing layer, then move into diagnostics or advisory if this maps directly to a current business bottleneck.