Data Engineering Practice
A Data Engineering Practice is the team and operating model responsible for the pipes: ingestion, storage, orchestration, schema management, and the underlying compute platform. Their work product is reliable, performant, well-modeled raw data โ not dashboards, not insights. They own SLAs on data freshness and pipeline uptime; they own the cost of the warehouse; they own the schema evolution policy. A healthy DE practice runs like a SRE team for data: on-call rotations, post-mortems, capacity planning, and a roadmap measured in 'platform reliability' metrics, not 'tickets closed.'
The Trap
The trap is staffing a data engineering team and then expecting them to also write analytics queries, build dashboards, and answer business questions. This is the 'one team does everything' anti-pattern. KnowMBA POV: data engineers are software engineers who happen to work on data โ most are bad at SQL business logic and uninterested in stakeholder management. Asking them to do analytics work makes them quit AND produces bad analytics. The fix is splitting data engineering from analytics engineering (the next step in the maturity model).
What to Do
Define your DE practice charter on three axes: (1) Scope โ they own raw + staging layers and the platform; analytics engineers own the modeled marts. (2) Reliability targets โ publish freshness and uptime SLAs (e.g., 99.5% on-time delivery for tier-1 pipelines). (3) On-call rotation โ a real one, with a runbook and a paging tool. If you can't fund all three, you have an 'ad-hoc data engineering effort,' not a practice.
Formula
In Practice
Netflix's Data Platform team famously operates one of the world's largest data infrastructures (multi-petabyte daily processing) with strict reliability SLAs. Their engineering blog details on-call rotations, formal post-mortems for pipeline outages, and dedicated platform PMs โ practices borrowed from SRE, not from traditional analytics teams. The result: thousands of internal users self-serve on a platform that almost never goes down, and the central data engineering team scales sub-linearly with usage.
Pro Tips
- 01
Adopt SRE practices wholesale: error budgets, blameless post-mortems, SLO/SLI/SLA distinction, and weekly operational reviews. Data engineering is closer to SRE than to traditional ETL engineering.
- 02
Build a 'pipeline catalog' that lists every pipeline with: owner, tier (1/2/3), freshness SLA, and last 30-day reliability score. This single artifact unlocks executive accountability and engineering prioritization.
- 03
The single biggest skill gap in data engineering hiring is software engineering rigor โ testing, version control, code review, CI/CD. Hire from backend engineering rather than from BI/ETL backgrounds and you'll see immediate quality improvement.
Myth vs Reality
Myth
โData engineers and analytics engineers are interchangeableโ
Reality
They are different jobs requiring different skills. Data engineers optimize Spark jobs and design Kafka topologies; analytics engineers write dbt models and define metric semantics. The Venn overlap is maybe 30%. Treating them as one role produces a team that is mediocre at both.
Myth
โPipelines should be built by whoever needs the dataโ
Reality
Decentralized pipeline ownership without governance creates a maintenance nightmare. Within 18 months you have 200 hand-rolled pipelines, no one knows which are still in use, and the warehouse bill is 3x what it should be. Centralized DE practice + decentralized analytics engineering is the proven model.
Try it
Run the numbers.
Pressure-test the concept against your own knowledge โ answer the challenge or try the live scenario.
Knowledge Check
Your data engineering team is constantly being pulled into 'urgent' analytics requests from sales and marketing. Reliability of core pipelines is declining. What is the structural fix?
Industry benchmarks
Is your number good?
Calibrate against real-world tiers. Use these ranges as targets โ not absolutes.
Tier-1 Pipeline Reliability
Business-critical data pipelines (revenue, billing, executive reporting)Elite
โฅ 99.9%
Strong
99-99.9%
Acceptable
98-99%
Poor
< 98%
Source: Hypothetical: KnowMBA synthesis from Monte Carlo State of Data Quality 2024 + practitioner interviews
Real-world cases
Companies that lived this.
Verified narratives with the numbers that prove (or break) the concept.
Netflix Data Platform
2015-Present
Netflix's data platform team operates one of the world's largest data infrastructures, processing multi-petabyte volumes daily for billions of viewing events. They publish their architecture and operational practices openly: dedicated on-call rotations, formal post-mortems, error budgets, and platform PMs. The result is a self-serve platform used by thousands of internal users with sub-linear central headcount growth โ the team has grown a fraction as fast as the data volume.
Daily Data Volume
Multi-PB scale
Internal Users
Thousands
Operating Model
SRE-style rotations + post-mortems
Headcount Growth vs Usage Growth
Sub-linear
Treat data engineering like infrastructure SRE, not like a service desk. The discipline that comes from on-call, SLOs, and post-mortems is what enables sub-linear scaling.
Decision scenario
Scaling the Data Engineering Team
You're the head of data at a 400-person company. Your data engineering team of 6 is drowning. Pipeline reliability has dropped from 98% to 91% over six months. The CFO is asking for a hiring case. The CTO is asking why you can't 'just use AI to fix this.'
DE Team Size
6 engineers
Total Pipelines
320
Tier-1 Pipeline Reliability (6mo ago)
98%
Tier-1 Pipeline Reliability (today)
91%
Open Pipeline Backlog
47 requests
Decision 1
Investigation reveals: 60% of the team's time is spent on ad-hoc analytics requests from business teams (which they're poorly suited to). Only 40% goes to actual platform work. The reliability drop correlates exactly with when the central BI team was 'consolidated' into data engineering.
Hire 4 more data engineers to absorb the workloadReveal
Restructure: keep 5 data engineers focused only on platform + pipelines; hire 3 analytics engineers to own modeled data and serve business teamsโ OptimalReveal
Related concepts
Keep connecting.
The concepts that orbit this one โ each one sharpens the others.
Beyond the concept
Turn Data Engineering Practice into a live operating decision.
Use this concept as the framing layer, then move into a diagnostic if it maps directly to a current bottleneck.
Typical response time: 24h ยท No retainer required
Turn Data Engineering Practice into a live operating decision.
Use Data Engineering Practice as the framing layer, then move into diagnostics or advisory if this maps directly to a current business bottleneck.