Conversational AI Design
Conversational AI design is the discipline of shaping how an AI assistant talks: scope, persona, turn-taking, fallback behavior, escalation paths, and what the bot is explicitly NOT allowed to do. The biggest design lever is scope. A bot that does one thing well (e.g., 'check my order status') beats a bot that tries to do everything badly. Scope discipline produces 80%+ contained-resolution rates; scope sprawl produces 25% rates and customer rage. Design is also where you decide tone (warm vs efficient), failure UX (what the bot says when it doesn't know), and escalation triggers (when humans are pulled in).
The Trap
The trap is launching a 'do everything' general assistant because LLMs technically can answer almost anything. Customers ask the bot how to file an insurance claim, the bot confidently produces inaccurate steps, and the customer churns. Equally dangerous: designing a bot that never admits uncertainty. The bot saying 'I don't know, here's a human' produces higher trust than the bot inventing an answer. Conversational AI fails when designers optimize for breadth and confidence instead of scope and honest failure modes.
What to Do
Design in this order: (1) Scope โ list the 5-10 jobs the bot does and the explicit list of jobs it refuses with handoff. (2) Persona โ one paragraph defining tone, voice, formality, and humor level. (3) Turn structure โ opening, clarification, action, confirmation, closing. (4) Failure UX โ exactly what the bot says when it can't help, with smooth handoff to a human. (5) Escalation triggers โ frustration signals, complex intents, regulated topics. Test with real users (not internal staff) before launch. Measure containment, CSAT, and escalation accuracy weekly.
Formula
In Practice
Klarna's customer service AI handled work equivalent to ~700 agents per Klarna's announcements; the design choice was tight scope (order/payment-related questions) with explicit escalation. Intercom's Fin and Ada's customer service bots both publish design patterns emphasizing scope discipline and confident escalation. Air Canada's chatbot incident โ where a court held Air Canada liable for a refund the bot incorrectly promised โ became the canonical case for why bots must be scoped, monitored, and bounded by clear policies.
Pro Tips
- 01
Design the failure path before the success path. The success path is what your bot does when everything works; the failure path is what defines whether customers trust the bot. A graceful 'I don't know โ let me get you to a person who does' beats a confident wrong answer every time.
- 02
Use 'sticky scope' wording in the bot's opening. 'I'm Aria, I can help with order status, returns, and shipping. For other questions I'll connect you with a teammate.' Customers self-route faster, and expectations are calibrated. Generic 'How can I help you today?' invites every question and produces every failure.
- 03
Treat the conversation log as the QA dataset. Pull 50 random transcripts per week, score them for accuracy/tone/escalation, and feed the failures back into prompt updates and scope refinements. Without this loop, the bot's quality drifts down silently.
Myth vs Reality
Myth
โModern LLMs are good enough that you don't need conversation designโ
Reality
LLMs are good at sounding good โ not at staying on-scope, refusing politely, or escalating reliably. Without design, an LLM-backed bot will confidently answer questions outside its competence. Design is the difference between a useful bot and a liability.
Myth
โVoice and chat designs can be interchangeableโ
Reality
Voice has different turn-taking constraints, no scrolling/re-reading, higher latency tolerance for clarification, and harder error recovery. A chat design ported to voice typically fails on user comprehension. Design separately or accept lower quality.
Try it
Run the numbers.
Pressure-test the concept against your own knowledge โ answer the challenge or try the live scenario.
Knowledge Check
A retail company is launching a customer service chatbot. Which design choice will most improve customer trust?
Industry benchmarks
Is your number good?
Calibrate against real-world tiers. Use these ranges as targets โ not absolutes.
Customer Service Bot Containment Rate
Customer service bots handling routine intents at consumer-facing companiesElite (Tightly Scoped)
75-90%
Strong
55-75%
Average
35-55%
Weak (Often Sprawled)
<35%
Source: Klarna, Intercom Fin, Ada published benchmarks + Forrester customer service AI reports
Real-world cases
Companies that lived this.
Verified narratives with the numbers that prove (or break) the concept.
Klarna AI Customer Service
2024
Klarna's OpenAI-powered customer service bot reportedly handled volume equivalent to 700 full-time agents within a month of launch, with parity-or-better customer satisfaction in their reporting. The design pattern: tightly scoped to payment, order, refund, and dispute topics; explicit handoff for complex disputes; tone and persona tuned to Klarna's brand voice. The reported containment rate was high precisely because scope discipline kept the bot inside its competence.
Workload Equivalent
~700 FTEs
Resolution Time
Reduced ~80%
CSAT
Parity with human agents (per Klarna)
Scope discipline produces the headline numbers. Klarna did not build a general assistant; they built a payment/order assistant that does one job extremely well.
Air Canada Chatbot
2022-2024
An Air Canada chatbot told a grieving customer he could apply for a bereavement discount retroactively after travel. Air Canada later refused, claiming the bot was a 'separate legal entity' responsible for its own information. The Canadian Civil Resolution Tribunal disagreed and ordered Air Canada to honor the bot's promise. The case became canonical for two reasons: courts hold companies liable for what their bots say, and bots without proper scope and bounded responses create real legal liability.
Disputed Refund
Small ($812 CAD)
Precedent Set
Companies liable for bot statements
Reputational Cost
Major
Treat your bot's outputs as company statements with full legal weight. Design scope and grounding to prevent the bot from making promises the company won't honor.
Related concepts
Keep connecting.
The concepts that orbit this one โ each one sharpens the others.
Beyond the concept
Turn Conversational AI Design into a live operating decision.
Use this concept as the framing layer, then move into a diagnostic if it maps directly to a current bottleneck.
Typical response time: 24h ยท No retainer required
Turn Conversational AI Design into a live operating decision.
Use Conversational AI Design as the framing layer, then move into diagnostics or advisory if this maps directly to a current business bottleneck.