AI StrategyAdvanced8 min read

AI Agents in Production

An AI agent is an LLM that decides what tools to call, in what order, with what inputs, to achieve a goal — without a human approving each step. The architecture has four parts: a planner (the LLM deciding next actions), a tool registry (functions the agent can call), memory (state across steps), and a controller (loop with stop conditions). Production-grade agents add a fifth: guardrails (rate limits, budget caps, human-in-the-loop checkpoints, action allowlists). The leap from 'chat with an LLM' to 'an LLM that takes actions' increases business value 10x and incident risk 100x.

Also known asAutonomous AgentsAgentic SystemsLLM AgentsTool-Using AI

Challenge a friend Browse library

The Trap

The trap is shipping agents without bounded blast radius. A chatbot that gives bad advice is recoverable. An agent with database write access, an unbounded loop, and a $5,000/day API budget is a Bloomberg headline waiting to happen. Common failure modes: (1) infinite loops where the agent keeps trying variations of the same broken tool call, burning tokens at $200/hour, (2) cascading errors where one bad tool output corrupts subsequent steps, (3) capability creep where 'just one more tool' grows the attack surface beyond what was risk-assessed.

What to Do

Engineer agents like financial trading systems. Apply five non-negotiable controls: (1) Per-task budget cap (max tokens, max tool calls, max wall-clock minutes — fail closed when exceeded). (2) Tool scopes — least-privilege access; the 'send_email' tool must restrict recipient domains. (3) Mandatory human-in-the-loop for any irreversible action (money movement, DELETE queries, external comms). (4) Trace logging of every plan + tool call + observation; every action is auditable. (5) A 'kill switch' that stops all running agent instances — practiced in drills, not just documented.

Formula

Agent Risk = Action Reversibility × Action Frequency × Capability Surface — Guardrail Coverage

In Practice

Cognition Labs' Devin and Vercel's v0 are publicly cited examples of agentic systems in production. Anthropic published case studies and engineering posts about Claude's tool-use deployments showing how customers wrap agents in controllers with budget caps and human approval gates for high-impact actions. The pattern across successful deployments: narrow scope (one job done well), bounded budgets, and explicit checkpoints — not 'AGI in production.'

Pro Tips

01
Start with 'augmenting agent' (proposes actions, human approves) before 'autonomous agent' (acts then logs). The data you collect from the augmenting phase tells you which actions are safe to auto-approve. Skipping this step is how you build the wrong autonomy boundary.
02
Idempotency is the unsung hero of agent reliability. Every tool the agent calls should be safe to retry — agents WILL retry on transient errors, and a non-idempotent 'send_invoice' tool will create duplicate invoices in the wild within weeks of launch.
03
Budget alarms should be aggressive. If a typical task costs $0.40, alarm at $4 and hard-stop at $20. Most production agent disasters start with a runaway loop that nobody noticed until it had spent thousands.

Myth vs Reality

Myth

“Better models will eliminate the need for guardrails”

Reality

Smarter agents make BIGGER mistakes faster. A model good enough to chain 30 tool calls is also good enough to chain 30 wrong tool calls in a way no human reviewer can untangle. Guardrails scale in importance with model capability, not against it.

Myth

“Agents are the future of all AI deployments”

Reality

Most enterprise AI value is delivered by single-step LLM calls (classification, extraction, summarization, drafting). Agents add complexity, cost, and risk. Reach for an agent only when the task genuinely requires multi-step decision-making over uncertain state — and then minimize the autonomy budget.

Try it

Run the numbers.

Pressure-test the concept against your own knowledge — answer the challenge or try the live scenario.

🧪

Knowledge Check

Your team is shipping a customer-service agent that can issue refunds, escalate to humans, and update CRM records. What's the most important guardrail to add?

Industry benchmarks

Is your number good?

Calibrate against real-world tiers. Use these ranges as targets — not absolutes.

Production Agent Maturity

Customer-facing autonomous agents

Production-Grade

Budget caps + HITL + tool scopes + trace logs + tested kill switch

Beta-Ready

Budget caps + logging + narrow scope

Internal Tool Only

Logging only, no hard caps

Don't Ship

No budget caps OR no HITL on irreversible actions

Source: Anthropic & OpenAI agent deployment guidance

Real-world cases

Companies that lived this.

Verified narratives with the numbers that prove (or break) the concept.

👨‍💻

Devin (Cognition Labs)

2024-2025

mixed

Cognition Labs publicly demonstrated Devin as a software-engineering agent capable of multi-step tasks: reading repos, writing code, running tests, debugging. Public commentary and the company's own materials discuss the engineering investment in the controller, sandbox environments, and bounded execution — not just the underlying model. Real-world usage exposed the gap between demo and production reliability that all agentic systems face.

Architecture

LLM planner + sandboxed tools + controller

Key Engineering Focus

Controller, sandbox, eval harness

The hard problem of agent products is not the LLM — it's the controller, the sandbox, and the eval harness around it.

Source ↗

🛒

Hypothetical: Procurement Agent Disaster

Composite scenario

failure

A logistics company shipped an autonomous procurement agent: read inventory levels, recommend orders, place purchase orders directly with suppliers under $10K. No daily aggregate cap. A bug in inventory ingestion reported zero stock for 200 SKUs simultaneously. The agent ordered $1.4M of inventory in one afternoon. No single PO tripped the $10K limit. Recovery took weeks of supplier negotiations.

Per-PO Limit

$10K (held)

Daily Aggregate Limit

None

Single-Day Spend

$1.4M

Root Cause

Missing aggregate guardrail + bad input data

Per-action limits are not aggregate limits. Agents need both. And every input the agent reads is a potential exploit vector — bad data is functionally equivalent to a malicious instruction.

Decision scenario

The Agent Autonomy Boundary

You're rolling out an internal IT-support agent. It can: search the knowledge base, reset passwords, install approved software, and open Jira tickets. Your security team wants every action approved by a human; your product team wants full autonomy for speed.

Tickets per Day

2,400

Avg Resolution Time (humans)

47 minutes

Cost per Resolution (humans)

$18

Risk Tolerance (security)

Very low for irreversible actions

Decision 1

You need a policy for which agent actions require human approval.

Full autonomy on all actions — humans only review escalations the agent flagsReveal

Speed metrics are excellent in week 1. In week 3, the agent installs an unapproved software package on 80 machines because of a knowledge-base typo. In week 6, a prompt-injection attack via a support email causes the agent to reset an executive's password and open a misleading ticket. Trust craters; security pulls the plug.

Resolution Time: 47 min → 4 minTrust: High → Zero in 6 weeksProject Status: Killed

Tiered autonomy: KB search and Jira ticketing fully autonomous; password resets require user-side MFA confirmation; software installs require IT approval. All actions logged with trace IDs. Aggregate daily caps on each action class.Reveal

Resolution time drops from 47 min to 11 min (huge win) without exposing irreversible blast radius. The user-side MFA on password resets blocks the prompt-injection attempt because the attacker can't satisfy MFA. Software install workflow goes from 2 days to 4 hours (humans approve in batches). After 90 days you have data showing which categories of password resets are safe to auto-approve and you expand autonomy with confidence.

Resolution Time: 47 min → 11 minCost per Resolution: $18 → $4.20Security Incidents: 0

Related concepts