K
KnowMBAAdvisory
OperationsIntermediate7 min read

Total Productive Maintenance

Total Productive Maintenance (TPM) is a Toyota-pioneered system where operators — not a separate maintenance department — own the day-to-day care of their equipment, with the goal of zero breakdowns, zero defects, and zero accidents. The headline metric is OEE (Overall Equipment Effectiveness) = Availability × Performance × Quality. World-class OEE is 85%+; most plants run 40-60% and don't realize it. TPM has eight pillars, but the operational core is two: Autonomous Maintenance (operators do cleaning, lubrication, tightening, inspection) and Planned Maintenance (scheduled interventions before failure). The KnowMBA take: TPM applies brutally well to knowledge work — your CI pipeline, your Kubernetes cluster, your data warehouse are 'machines' that need scheduled care. SaaS teams that treat infra like consumable hardware (only fix when broken) burn 40% of engineering hours on incidents that planned maintenance would have prevented.

Also known asTPMAutonomous MaintenanceOperator-Driven ReliabilityZero Breakdown Maintenance

The Trap

Companies install a 'TPM program' as a separate initiative run by the maintenance manager, then wonder why nothing changes. The whole point is that OPERATORS own equipment care — if you carve out a 'TPM team,' you've recreated the bureaucracy TPM was designed to dissolve. The other trap: chasing OEE as a vanity number. A station with 95% OEE that's not the bottleneck is irrelevant; a bottleneck running at 60% OEE is the only thing that matters. And measuring availability without measuring quality leads to producing fast garbage — high availability + low quality = lots of rework dressed as throughput.

What to Do

Pick one critical piece of equipment (your bottleneck, per Theory of Constraints). Measure baseline OEE for two weeks: clock every minute of downtime by category (breakdown, changeover, minor stop, idle). Then run a 5-day kaizen blitz: deep-clean the equipment with operators (cleaning surfaces problems — leaks, loose bolts, wear — that you'd never notice running), build a one-page Autonomous Maintenance checklist (daily 5-min checks, weekly 20-min checks), and move predictable failures from 'breakdown' to 'planned.' Re-measure OEE after 60 days. Expect 10-20 point improvement in the first cycle without buying anything.

Formula

OEE = Availability × Performance × Quality, where Availability = Run Time / Planned Time, Performance = (Ideal Cycle × Count) / Run Time, Quality = Good Count / Total Count

In Practice

Toyota's Tahara plant — the most advanced auto plant in the world for decades — runs OEE above 85% on equipment that competitors average 50% on. The difference isn't better machines; it's that every operator at Tahara starts their shift with a 10-minute machine inspection (oil levels, belt tension, sensor cleaning) and stops the line at the first abnormal sound. Toyota proved in the 1970s under Seiichi Nakajima (who formalized TPM at Nippondenso, a Toyota supplier) that operator-owned care eliminates 70%+ of unplanned downtime — because operators feel the machine every day and notice changes a maintenance tech on a quarterly visit never would.

Pro Tips

  • 01

    Seiichi Nakajima's six big losses to attack in order: (1) breakdowns, (2) setup/adjustments, (3) minor stops, (4) reduced speed, (5) startup defects, (6) production defects. Most plants only see #1 because it's loud. The hidden killer is #3 — minor stops under 5 minutes that no one logs but consume 15-20% of capacity.

  • 02

    World-class OEE = 85% (Availability 90% × Performance 95% × Quality 99.9%). Below 65% means major losses you don't see. The first time you measure honestly, you'll be shocked how low your number is — that's normal, and it's where the gold is.

  • 03

    For SaaS: your TPM equivalent is incident postmortems + scheduled chaos engineering + planned dependency upgrades. Teams that 'don't have time' for these run at ~40% engineering OEE — most hours go to incidents and rework, not feature throughput.

Myth vs Reality

Myth

TPM is just preventive maintenance with a fancy name

Reality

Preventive maintenance is scheduled by a maintenance team. TPM transfers ownership to operators and embeds quality, safety, and continuous improvement into daily work. The cultural shift — operators as machine owners, not button-pushers — is the actual point. Calendar-based maintenance without the cultural shift fails.

Myth

We can't do TPM until we have spare time / spare people

Reality

TPM CREATES capacity by eliminating the unplanned downtime that's stealing it now. The 5-10 minutes per shift spent on autonomous checks pays back 5-10x in avoided breakdowns. Companies that wait for 'spare time' never start; companies that start always find the time was already there, hidden in firefighting.

Try it

Run the numbers.

Pressure-test the concept against your own knowledge — answer the challenge or try the live scenario.

🧪

Knowledge Check

Challenge coming soon for this concept.

Industry benchmarks

Is your number good?

Calibrate against real-world tiers. Use these ranges as targets — not absolutes.

Overall Equipment Effectiveness (OEE)

Discrete manufacturing across industries

World-Class

≥ 85%

Strong

75-85%

Typical

60-75%

Weak

40-60%

Crisis

< 40%

Source: Seiichi Nakajima / JIPM (Japan Institute of Plant Maintenance)

Real-world cases

Companies that lived this.

Verified narratives with the numbers that prove (or break) the concept.

🚗

Toyota (Tahara Plant)

1970s-Present

success

Toyota's Tahara plant institutionalized TPM under Seiichi Nakajima's framework: every operator does a 10-minute equipment check at shift start, owns cleaning and lubrication of their station, and is empowered to stop the line on any abnormal sound or vibration. Maintenance technicians shifted from break-fix to coaching operators and tackling complex root causes. The result: OEE consistently above 85% on equipment competitors average 50-55% on, with breakdown rates 1/10th of the US Big Three plants.

OEE

85%+ sustained

Unplanned Breakdowns

~10% of US peer plants

Operator Maintenance Time

~30 min/shift

Breakeven vs. Peers

Lower-cost-per-unit despite higher labor cost

OEE gains don't come from better machines or more techs — they come from giving daily ownership to the people who touch the equipment every shift.

Source ↗
📦

Hypothetical: Mid-Market CPG Co-Packer

Recent

success

A 200-employee co-packer was missing 20% of customer ship dates, blamed 'old equipment' and proposed $4M in line replacements. Honest OEE measurement revealed the bottleneck filling line ran at 47% OEE — losing ~30% to minor stops nobody was logging (jams under 3 min). A 90-day TPM rollout (operator deep-clean kaizen, daily inspection card, planned changeover practice) lifted OEE to 71% with $35K in spend. Throughput rose 50%; the $4M capex was canceled.

OEE Before

47%

OEE After 90 Days

71%

Spend

$35K (vs. $4M proposed)

On-Time Ship Rate

80% → 96%

Before you justify capex on the grounds that 'the equipment is old,' measure honest OEE. Most plants discover 30+ points of throughput hiding in their existing assets.

Decision scenario

The Capex vs. TPM Investment Decision

You're VP of Operations at a packaging plant. The CEO has a $1.2M capex slot for next quarter. The plant manager wants a new high-speed filler ($1.2M, claims +25% throughput). The shop floor lead says current OEE on the existing filler is 52% and a TPM program could lift it to 75%+ for a fraction of the cost. The bottleneck IS the filler. You have to decide before Friday.

Current OEE (filler)

52%

Throughput

1,800 cases/shift

Capex Available

$1.2M

On-Time Ship Rate

78%

Bottleneck

Filler (confirmed)

01

Decision 1

You walk the floor. The filler stops 6-8 times per shift for 2-4 min each — operators clear jams, restart, no one logs it. Daily clean-down takes 90 min because grime has built up for years. The maintenance team only touches the filler when it fails outright. The plant manager's pitch is real — a new machine WOULD be faster. But the existing one has 35 points of OEE hiding in plain sight.

Buy the new $1.2M filler — more reliable, more capacity, plant manager has experience justifying capexReveal
New filler installed. First-month OEE: 58% — the same minor stops, the same lack of operator ownership, just on shinier equipment. Throughput rises 12% (not 25%) because the underlying TPM gaps weren't addressed. CFO asks why a $1.2M asset is delivering 60% of promised return. The new machine becomes the new neglected machine. You spent $1.2M for a 12% lift you could have gotten for $40K.
OEE: 52% → 58%Throughput: +12%Capex Spent: $1.2M
Run a 90-day TPM kaizen on the existing filler first ($40K). Hold capex pending results. Reassess in 3 months.Reveal
Day 1-5: deep-clean kaizen surfaces 14 latent issues (worn seals, mis-aligned guides, sensor drift). Operators start daily 8-min checks; maintenance moves to scheduled intervals. By month 3, OEE is 73%. Throughput rises 38% — far more than the new machine would have delivered. The unspent $1.16M goes toward upstream capacity (which is now the new bottleneck). You also built the operator culture that will keep the gain. CFO loves you.
OEE: 52% → 73%Throughput: +38%Capex Saved: $1.16M redirected to next constraint

Related concepts

Keep connecting.

The concepts that orbit this one — each one sharpens the others.

Beyond the concept

Turn Total Productive Maintenance into a live operating decision.

Use this concept as the framing layer, then move into a diagnostic if it maps directly to a current bottleneck.

Typical response time: 24h · No retainer required

Turn Total Productive Maintenance into a live operating decision.

Use Total Productive Maintenance as the framing layer, then move into diagnostics or advisory if this maps directly to a current business bottleneck.