AI StrategyIntermediate7 min read

Conversational AI Design

Conversational AI design is the discipline of shaping how an AI assistant talks: scope, persona, turn-taking, fallback behavior, escalation paths, and what the bot is explicitly NOT allowed to do. The biggest design lever is scope. A bot that does one thing well (e.g., 'check my order status') beats a bot that tries to do everything badly. Scope discipline produces 80%+ contained-resolution rates; scope sprawl produces 25% rates and customer rage. Design is also where you decide tone (warm vs efficient), failure UX (what the bot says when it doesn't know), and escalation triggers (when humans are pulled in).

Also known asChatbot DesignConversation DesignVoice AI DesignDialogue DesignLLM UX Design

Challenge a friend Browse library

The Trap

The trap is launching a 'do everything' general assistant because LLMs technically can answer almost anything. Customers ask the bot how to file an insurance claim, the bot confidently produces inaccurate steps, and the customer churns. Equally dangerous: designing a bot that never admits uncertainty. The bot saying 'I don't know, here's a human' produces higher trust than the bot inventing an answer. Conversational AI fails when designers optimize for breadth and confidence instead of scope and honest failure modes.

What to Do

Design in this order: (1) Scope — list the 5-10 jobs the bot does and the explicit list of jobs it refuses with handoff. (2) Persona — one paragraph defining tone, voice, formality, and humor level. (3) Turn structure — opening, clarification, action, confirmation, closing. (4) Failure UX — exactly what the bot says when it can't help, with smooth handoff to a human. (5) Escalation triggers — frustration signals, complex intents, regulated topics. Test with real users (not internal staff) before launch. Measure containment, CSAT, and escalation accuracy weekly.

Formula

Conversational Quality = (Containment Rate × CSAT × Escalation Accuracy) — penalized by Hallucination Rate

In Practice

Klarna's customer service AI handled work equivalent to ~700 agents per Klarna's announcements; the design choice was tight scope (order/payment-related questions) with explicit escalation. Intercom's Fin and Ada's customer service bots both publish design patterns emphasizing scope discipline and confident escalation. Air Canada's chatbot incident — where a court held Air Canada liable for a refund the bot incorrectly promised — became the canonical case for why bots must be scoped, monitored, and bounded by clear policies.

Pro Tips

01
Design the failure path before the success path. The success path is what your bot does when everything works; the failure path is what defines whether customers trust the bot. A graceful 'I don't know — let me get you to a person who does' beats a confident wrong answer every time.
02
Use 'sticky scope' wording in the bot's opening. 'I'm Aria, I can help with order status, returns, and shipping. For other questions I'll connect you with a teammate.' Customers self-route faster, and expectations are calibrated. Generic 'How can I help you today?' invites every question and produces every failure.
03
Treat the conversation log as the QA dataset. Pull 50 random transcripts per week, score them for accuracy/tone/escalation, and feed the failures back into prompt updates and scope refinements. Without this loop, the bot's quality drifts down silently.

Myth vs Reality

Myth

“Modern LLMs are good enough that you don't need conversation design”

Reality

LLMs are good at sounding good — not at staying on-scope, refusing politely, or escalating reliably. Without design, an LLM-backed bot will confidently answer questions outside its competence. Design is the difference between a useful bot and a liability.

Myth

“Voice and chat designs can be interchangeable”

Reality

Voice has different turn-taking constraints, no scrolling/re-reading, higher latency tolerance for clarification, and harder error recovery. A chat design ported to voice typically fails on user comprehension. Design separately or accept lower quality.

Try it

Run the numbers.

Pressure-test the concept against your own knowledge — answer the challenge or try the live scenario.

🧪

Knowledge Check

A retail company is launching a customer service chatbot. Which design choice will most improve customer trust?

Industry benchmarks

Is your number good?

Calibrate against real-world tiers. Use these ranges as targets — not absolutes.

Customer Service Bot Containment Rate

Customer service bots handling routine intents at consumer-facing companies

Elite (Tightly Scoped)

75-90%

Strong

55-75%

Average

35-55%

Weak (Often Sprawled)

<35%

Source: Klarna, Intercom Fin, Ada published benchmarks + Forrester customer service AI reports

Real-world cases

Companies that lived this.

Verified narratives with the numbers that prove (or break) the concept.

🛍️

Klarna AI Customer Service

2024

success

Klarna's OpenAI-powered customer service bot reportedly handled volume equivalent to 700 full-time agents within a month of launch, with parity-or-better customer satisfaction in their reporting. The design pattern: tightly scoped to payment, order, refund, and dispute topics; explicit handoff for complex disputes; tone and persona tuned to Klarna's brand voice. The reported containment rate was high precisely because scope discipline kept the bot inside its competence.

Workload Equivalent

~700 FTEs

Resolution Time

Reduced ~80%

CSAT

Parity with human agents (per Klarna)

Scope discipline produces the headline numbers. Klarna did not build a general assistant; they built a payment/order assistant that does one job extremely well.

Source ↗

✈️

Air Canada Chatbot

2022-2024

failure

An Air Canada chatbot told a grieving customer he could apply for a bereavement discount retroactively after travel. Air Canada later refused, claiming the bot was a 'separate legal entity' responsible for its own information. The Canadian Civil Resolution Tribunal disagreed and ordered Air Canada to honor the bot's promise. The case became canonical for two reasons: courts hold companies liable for what their bots say, and bots without proper scope and bounded responses create real legal liability.

Disputed Refund

Small ($812 CAD)

Precedent Set

Companies liable for bot statements

Reputational Cost

Major

Treat your bot's outputs as company statements with full legal weight. Design scope and grounding to prevent the bot from making promises the company won't honor.

Source ↗

Related concepts