K
KnowMBAAdvisory
AutomationAdvanced8 min read

Automation Failure Modes

Automation Failure Modes are the recurring patterns that cause automation projects and programs to underdeliver, fail outright, or actively destroy value. The major categories: (1) Automating a broken process (industrializes dysfunction), (2) Brittleness to upstream change (bot breaks on every UI update), (3) Unrealized capacity (saved hours never convert to cost reduction), (4) Governance debt (orphaned bots with no owner), (5) Composition shift (the manual remainder gets harder and more expensive), (6) Black-box opacity (decisions you can't explain or audit), (7) Skill atrophy (humans lose the ability to do the underlying work), and (8) Optimization theater (vanity metrics that don't tie to P&L). Most underperforming automation programs are suffering from 3-5 of these simultaneously.

Also known asAutomation Anti-PatternsAutomation RisksWhy Automation FailsAutomation Pitfalls

The Trap

The trap is treating each failure as a one-off ('that bot broke because of an update,' 'that project missed its ROI because of bad scoping') rather than recognizing them as a recurring class of risks that need systemic mitigation. The same five failure modes appear in 80% of automation post-mortems across industries. When a CIO says 'we'll do automation right this time,' but doesn't have specific countermeasures for these eight patterns, the program is on track to repeat the same failures.

What to Do

Build an explicit pre-deployment checklist mapping each failure mode to a control: (1) Process redesign required before automation (vs broken process), (2) API-first decision tree (vs brittleness), (3) Named capacity-redeployment owner (vs unrealized savings), (4) Mandatory ownership + retirement date (vs orphaned bots), (5) Composition shift monitoring (vs eroding economics), (6) Explainability requirements baked in (vs black-box opacity), (7) Training and runbook for the underlying manual process (vs skill atrophy), (8) Verified P&L tracking (vs vanity metrics). Run a quarterly post-mortem against this checklist for every active automation.

Formula

Program Risk Score = ฮฃ(Active Failure Modes ร— Severity ร— Coverage Footprint)

In Practice

The McDonald's Plexure-powered McD App in Australia in 2021 famously failed when the automated ordering system, optimized for upsell and recommendation, started recommending bizarre item combinations and quietly increasing average order values in ways customers found manipulative. The system optimized perfectly for what it was measured against (basket size) while undermining the long-term customer relationship. This is the textbook 'optimization theater' failure mode: automation that hits its narrow metric while damaging the broader business.

Pro Tips

  • 01

    Run an annual 'failure mode audit' across your automation portfolio. Score each automation against the 8 modes. The portfolio's risk profile is usually concentrated in 2-3 modes โ€” fix those systematically.

  • 02

    The most expensive failure mode in the long run is governance debt (orphaned bots). The most embarrassing in the short run is black-box opacity (unexplainable decisions). The most insidious is composition shift (slow degradation of economics).

  • 03

    When an automation 'works' but the business owner is unhappy, the failure is almost always optimization theater โ€” the bot is hitting its metric but missing the business outcome. Fix the metric, not the bot.

Myth vs Reality

Myth

โ€œBetter technology will prevent these failuresโ€

Reality

Every failure mode listed is observable in programs using state-of-the-art technology. The failures are operational and organizational, not technical. New tools don't fix old problems; they just create new versions of them.

Myth

โ€œMost automation failures are about bad code or bugsโ€

Reality

Bug-level failures are noise. The failures that destroy value are structural: wrong process, wrong measurement, wrong ownership. Engineering quality matters but is rarely the dominant factor in program-level success or failure.

Try it

Run the numbers.

Pressure-test the concept against your own knowledge โ€” answer the challenge or try the live scenario.

๐Ÿงช

Knowledge Check

Your automation program shows 240% ROI on the dashboard but the CFO can't find the savings in the P&L. Which failure mode is most likely operating?

Industry benchmarks

Is your number good?

Calibrate against real-world tiers. Use these ranges as targets โ€” not absolutes.

Healthy Automation Portfolio Profile

Enterprise automation portfolios at 18+ months of operation

Mature

Orphaned <5%, Broken <10%, Verified ROI >70%

Healthy

Orphaned <15%, Broken <20%, Verified ROI >50%

Strained

Orphaned <25%, Broken <30%, Verified ROI >30%

Distressed

Above strained thresholds on any dimension

Source: Deloitte / EY Intelligent Automation Maturity Reports

Real-world cases

Companies that lived this.

Verified narratives with the numbers that prove (or break) the concept.

๐Ÿ”

McDonald's / Plexure (App Recommendations)

2021

failure

McDonald's Australia faced public criticism when its mobile ordering app, powered by Plexure's recommendation engine, generated bizarre upsell suggestions and quietly nudged average order values upward. The automation optimized perfectly for its narrow metric (basket size) while damaging customer trust. The episode became a widely-cited example of optimization theater: the system 'worked' but undermined the business outcome it was meant to serve.

Optimization Target

Basket size / upsell success

Outcome

Customer backlash, reputational damage

Failure Mode

Optimization theater

Lesson

Right metric is broader than the bot's KPI

Automation will optimize for whatever you measure it on. If the metric isn't aligned with the business outcome, the automation will reliably hit the metric while undermining the outcome.

Source โ†—
๐Ÿš›

Hypothetical: Logistics Carrier RPA Portfolio Failure

2019-2024

failure

A logistics carrier built a 280-bot RPA portfolio between 2019-2022. By 2024: 35% of bots in remediation, 22% orphaned, only 28% with verified P&L impact, no explainability documentation for the 40 bots making routing decisions. Year-end audit identified 6 of the 8 major failure modes operating simultaneously. The carrier wrote off 140 bots and rebuilt the program with a structured failure-mode prevention checklist.

Bots at Failure Audit

280

Failure Modes Active

6 of 8

Bots Written Off

140 (50%)

Remediation Cost

~$2.4M + 14 months

Failure modes compound. A program with 1-2 active modes is recoverable; a program with 5+ active modes typically requires a reset, not a fix.

Decision scenario

Diagnosing a Stalled Automation Program

You're brought in as VP of Automation at a 6,000-person services firm. The 3-year-old program has 110 bots, 6 FTE maintenance team, $1.8M annual operating cost, and CFO-verified P&L impact of $1.1M (i.e., negative ROI). The CEO wants a 90-day diagnostic and a recommendation.

Bots in Production

110

Annual Operating Cost

$1.8M

Verified P&L Impact

$1.1M

Net Position

โˆ’$0.7M annually

01

Decision 1

Diagnostic reveals the failure mode mix. Your recommendation will frame the next 18 months.

Recommend doubling the maintenance team to clear the backlog and stabilize all 110 botsReveal
Six months in: 12 FTE, $3.2M annual cost, P&L impact creeps to $1.3M. Net position is now โˆ’$1.9M. The 'fix everything' approach makes the economic problem worse. The CEO loses confidence and the program is restructured under IT with deep cuts.
Annual Net Position: โˆ’$0.7M โ†’ โˆ’$1.9MProgram Status: Forced restructuring
Triage: keep the 30 highest-ROI bots, retire the 50 lowest-ROI, replace the brittle middle 30 with process redesign + API integration over 12 months. Cut maintenance to 3 FTE.Reveal
Hard quarter politically. After 12 months: 60 well-maintained bots producing $1.9M verified P&L. Operating cost dropped to $750K. Net position swings from โˆ’$0.7M to +$1.15M. Program credibility rebuilt. The CEO becomes a sponsor.
Annual Net Position: โˆ’$0.7M โ†’ +$1.15MBots in Production: 110 โ†’ 60
Recommend shutting down the entire program as fundamentally brokenReveal
$1.1M of verified P&L impact disappears overnight. The ~30 high-performing bots that were genuinely creating value are killed alongside the underperformers. The political cost is acceptable but the economic cost is unnecessary. The right answer is almost never 'kill everything.'
Annual Net Position: โˆ’$0.7M โ†’ โˆ’$1.1M (loss of all verified savings)Program Status: Terminated

Related concepts

Keep connecting.

The concepts that orbit this one โ€” each one sharpens the others.

Beyond the concept

Turn Automation Failure Modes into a live operating decision.

Use this concept as the framing layer, then move into a diagnostic if it maps directly to a current bottleneck.

Typical response time: 24h ยท No retainer required

Turn Automation Failure Modes into a live operating decision.

Use Automation Failure Modes as the framing layer, then move into diagnostics or advisory if this maps directly to a current business bottleneck.