Voice agents

Voice agents and WCAG 2.2 AA: eight quiet hearing failures

Eight ways your voice agent quietly fails hearing-impaired callers under WCAG 2.2 AA, ranked by which ones a Dutch procurement officer will surface in the second-round questionnaire.

Jacob Molkenboer· Founder · A Brand New Company· 1 Dec 2024· 9 min

Vintage black bakelite phone receiver off-hook on leather blotter, coiled cord, green ribbon bookmark, brass bell, ivory card.

It is 14:00 on a Tuesday at a mid-sized Dutch gemeente. The procurement officer has your voice-agent demo running in one tab and the second-round accessibility questionnaire open in another. She is not an accessibility specialist. She has a checklist that maps to EN 301 549 and a habit of asking "show me" whenever a vendor writes "yes" in a compliance box.

That tab combination is where most voice-agent pilots quietly lose. The agent works fine for the hearing caller in the demo script. It also breaks WCAG 2.2 AA for the hearing-impaired caller in ways that are invisible until someone files a question.

This is a field guide to the eight failures we see most often, ranked by which one she catches first.

1. No live transcript on a surface the caller can reach

The most common failure is also the cheapest to spot. The reviewer asks: "show me where the live transcript appears." If your answer is "we have one, it is in our internal monitoring dashboard," the next box gets a red mark.

WCAG 2.2 by itself does not mandate live captioning for a phone call. EN 301 549 clause 6.2.2.1 mandates real-time text capability for two-way voice services in scope, and the Dutch public-sector register leans on it as the operational test. A hearing-impaired caller needs to see what the agent is saying, in something close to real time, on a surface they can actually reach. An internal dashboard does not qualify. A WhatsApp or web companion view does.

The cheapest fix we have shipped is a short link, sent by SMS at the start of the call, that opens a web view streaming the agent's text via server-sent events. Round-trip latency from token to screen sits under 400ms.

2. Audio-only one-time passcodes

This fails WCAG 2.2 success criterion 3.3.8, Accessible Authentication (Minimum) in the cleanest possible way, and it is the failure your security team will defend the longest.

The pattern: the caller verifies identity by repeating a 6-digit code the agent reads aloud. If the caller cannot hear the code, they cannot authenticate. There is no cognitive-function-test exception that rescues this, because the test is hearing, not cognition.

The fix is structural. Deliver the code through a channel the caller has already nominated. SMS, email, an authenticated app, or a webview tied to the call session. "We can fall back to SMS on request" reads worse on a questionnaire than "the caller picks their preferred channel during the IVR opener."

3. No handoff to Teletolk or another text-relay service

In the Netherlands, the named relay service for d/Deaf and hard-of-hearing callers is KPN Teletolk. Procurement officers know the name. They will ask whether the agent can accept an incoming Teletolk call, and what happens to the conversational state when a human relay operator joins the line.

The honest answer for most current voice agents is "the agent treats the relay operator as the caller, which means the operator's voice gets transcribed, the caller's typed input never reaches the agent, and the loop breaks halfway through." This is not a WCAG criterion on its own. It is the operational consequence of failing several at once: 2.2.1 Timing Adjustable, 3.3.1 Error Identification, and the EN 301 549 RTT clauses.

The minimum acceptable answer is: detect that a relay is on the line, slow the agent's pacing, lengthen every timeout, and offer to escalate to a human queue. The better answer is a dedicated text channel the relay operator can drive directly, with a session token that ties their typed input to the same conversation history the spoken caller would have built.

4. Timeouts calibrated for hearing callers only

A standard voice agent gives the caller five to seven seconds of silence before it reprompts. For a caller using a relay service, that window is gone before the operator has finished typing the prompt.

This is WCAG 2.2 success criterion 2.2.1, Timing Adjustable. The criterion requires that a user can turn off, adjust, or extend any time limit that is not essential. A reprompt window is not essential. It is a UX comfort.

The cheap fix is a turn-by-turn flag. If the session was opened by a relay number, or if the caller explicitly told the IVR opener "I am using a relay service," every timeout in the dialogue tree multiplies by three. Hard-code this. Do not rely on the model to remember.

def reprompt_timeout(session):
    base_seconds = 6.0
    if session.relay_detected or session.caller_declared_relay:
        return base_seconds * 3
    if session.caller_requested_slow:
        return base_seconds * 2
    return base_seconds

5. Confirmation paths that only accept spoken "yes"

The agent says: "I'm about to book the appointment for Thursday at 10. Say yes to confirm." A hearing-impaired caller using a relay cannot speak the word in a way the speech-to-text layer will recognise as confirmation. A caller who can hear but cannot speak clearly has the same problem.

This compounds with criterion 3.3.4 (Error Prevention) on transactional flows. The booking is reversible in theory, but the caller has no way to confirm the reversal either.

Every confirmation step needs at least one non-voice path. DTMF (press 1) is the boring, correct answer. A web confirm button on the companion view is better. Both at once, with the agent narrating the option, is the version that earns full marks on the questionnaire.

6. Speech rate locked to a single default

This one trips on success criterion 1.4.2, Audio Control in spirit if not in letter. A caller with mild hearing loss often hears better at 0.85x speed. A caller with a cochlear implant tuned to higher frequencies often prefers a deeper voice.

The minimum: a global "speak slower" command the agent recognises during any turn, plus a voice selection step in the IVR opener. The "speak slower" command should adjust the SSML prosody rate, not the playback speed of a pre-rendered file, because pre-rendered playback degrades intelligibility instead of improving it.

7. Background music or ambient prosody during prompts

Hold music is fine. Music underneath the agent's prompts is not. Procurement will not catch this from the questionnaire alone. They will catch it the first time they listen to a recorded session at 1.5x speed and notice the prompts are harder to parse than the hold music.

The relevant criterion is 1.4.7 (Low or No Background Audio), which is AAA and not strictly required. The relevant Dutch reality is that public-sector reviewers will mark this as a concern regardless of level. The fix is one line in the voice config: turn the background bed off during prompt playback. Save the ambient track for the wait state, where it actually helps the caller know the line is still alive.

8. No post-call transcript artifact for the caller

The last failure is the one that survives the call. A hearing caller can be reasonably confident about what the agent agreed to. A hearing-impaired caller has nothing to go back to.

WCAG 2.2 does not name this explicitly. The European Accessibility Act, in force as of 28 June 2025, treats post-transaction artifacts as part of the accessibility of the service itself. A transcript emailed to the caller within five minutes of hangup, with a short summary and the next action, closes the loop. A link to a webview that holds the same content for 30 days does the same job.

Make this default-on. Make the opt-out require explicit caller consent inside the call. The Dutch register will check.

Takeaway

Voice agents fail WCAG 2.2 AA at the seams between the call and every other channel. Most of the eight failures above get fixed not by improving the agent, but by giving the caller a parallel surface (SMS, webview, email) that travels with the session.

What the second-round questionnaire actually asks

The questionnaire varies by ministry, but the shape is consistent. It will ask you to map each in-scope feature to an EN 301 549 clause, attach evidence (a screen recording, a transcript export, a config snippet), and declare any partial conformance. It will not ask "are you WCAG compliant." That phrasing is gone from modern Dutch tender language because it produces useless answers.

The most useful thing you can do before submitting is to record a five-minute simulated call where the caller is on Teletolk. Watch it back with the questionnaire open. The failures you spot are the ones the reviewer will spot. Fix those first, then re-record.

The wiring we ended up keeping

When we built a voice agent for a Dutch insurer last quarter, the thing we kept running into was item 3 above. The relay handoff broke the conversational state every time. We solved it by treating the relay operator as a privileged channel with its own pacing config, and by writing the agent's intent back to a webview the caller could see live. The webview also doubled as the surface for items 1, 5, and 8. If you want to see how that wiring looks, we wrote up the architecture under AI agents.

The five-minute audit you can run today: pick one of your live voice flows, record a session where the caller never speaks (only types via a relay simulator or a colleague using a keyboard), and listen back with the eight items above written on a Post-it. The first failure you hear is the one to fix this week.

Key takeaway

Voice agents fail WCAG 2.2 AA at the seams between the call and the rest of your stack. Fix it with a parallel surface, not a better model.

FAQ

Does WCAG 2.2 AA actually apply to a phone-based voice agent?

Directly, no. WCAG targets web content. In practice Dutch procurement maps voice services to EN 301 549, which embeds WCAG criteria and adds clauses for two-way voice and real-time text.

What is KPN Teletolk and why does it matter for procurement?

Teletolk is the Dutch text-relay service that bridges d/Deaf and hard-of-hearing callers to voice lines. Public-sector reviewers expect your agent to handle a relay-initiated call without losing state.

Is the European Accessibility Act enforceable against private companies in 2026?

Yes. The EAA has been in force since 28 June 2025 and covers customer-facing services from banking, e-commerce, transport ticketing, and electronic communications, including voice-agent interactions.

What is the cheapest single fix that closes the most gaps?

A web companion view tied to the call session. It carries the live transcript, the non-audio OTP, the confirm buttons, and the post-call artifact in one surface and travels with the caller via an SMS link.

voice agentsaccessibilityai agentsoperationsstrategybusiness

Building something?

Start a project