Blog

Video Captioning Requirements: WCAG 2.2 Media Accessibility Compliance

TestParty
TestParty
March 27, 2025

Video content presents accessibility barriers that text does not. Deaf users cannot hear audio. Blind users cannot see visual-only information. Users in sound-sensitive environments cannot play audio. Without captions, transcripts, and audio descriptions, video content excludes significant user populations.

WCAG establishes clear requirements for making video and audio content accessible. This guide covers captioning requirements, transcript standards, audio description guidelines, and media player accessibility—everything needed for compliant video content.


Why Media Accessibility Matters

Video accessibility affects multiple user groups.

Deaf and Hard of Hearing Users

Approximately 466 million people worldwide have disabling hearing loss. Without captions, deaf users cannot access audio content in videos—dialogue, narration, sound effects, music cues.

Blind and Low-Vision Users

Blind users cannot perceive visual-only information in videos. When important content appears only visually—on-screen text, demonstrations, actions, scene changes—audio descriptions make this content accessible.

Cognitive Disabilities

Captions benefit users with cognitive disabilities who process information more effectively when reading alongside listening. The dual-channel presentation reinforces comprehension.

Situational Limitations

Beyond permanent disabilities, captions serve:

  • Users in sound-sensitive environments (offices, public spaces)
  • Users with temporary hearing issues
  • Non-native language speakers
  • Users learning to read
  • Anyone in loud environments where audio isn't audible

Business Impact

Facebook reports that videos with captions have 12% more views. LinkedIn found captioned videos receive 26% more engagement. Accessibility and engagement align.


WCAG Requirements for Media

Multiple success criteria address audio and video accessibility.

1.2.1 Audio-only and Video-only (Prerecorded) — Level A

For prerecorded audio-only content:

  • Provide a text transcript

For prerecorded video-only content (no audio):

  • Provide either a text transcript OR audio track describing the video content

1.2.2 Captions (Prerecorded) — Level A

For prerecorded video with audio:

  • Captions are provided for all prerecorded audio content in synchronized media

This is a Level A (minimum) requirement—all websites need captions on prerecorded videos.

1.2.3 Audio Description or Media Alternative (Prerecorded) — Level A

For prerecorded video with audio:

  • Provide either audio description OR a full text alternative (transcript including visual descriptions)

1.2.4 Captions (Live) — Level AA

For live video with audio:

  • Captions are provided for all live audio content in synchronized media

Real-time captioning (CART services or automatic speech recognition) is required for live video.

1.2.5 Audio Description (Prerecorded) — Level AA

For prerecorded video with audio:

  • Audio description is provided for all prerecorded video content

At Level AA, audio description is required—not just a full transcript alternative.

1.2.6 Sign Language (Prerecorded) — Level AAA

For prerecorded video with audio:

  • Sign language interpretation is provided

1.2.7 Extended Audio Description (Prerecorded) — Level AAA

For prerecorded video:

  • Where pauses in audio are insufficient for audio descriptions, extended audio description is provided

Extended audio description pauses video to allow longer descriptions.

1.2.8 Media Alternative (Prerecorded) — Level AAA

For prerecorded synchronized media and video-only:

  • A full text alternative is provided

1.2.9 Audio-only (Live) — Level AAA

For live audio-only content:

  • A text alternative is provided

Caption Requirements

Captions are the most fundamental video accessibility requirement.

What Captions Must Include

Captions must convey all audio content, not just dialogue.

Dialogue and Narration: All spoken words, accurately transcribed.

Speaker Identification: When multiple speakers appear, identify who is speaking.

[Sarah] The new design improves accessibility significantly.
[Tom] What specific changes did you make?

Sound Effects: Meaningful sounds that affect understanding.

[door slams]
[phone ringing]
[applause]

Music: When music conveys meaning or emotion.

[tense music playing]
[upbeat background music]
♪ Happy birthday to you ♪

Tone and Manner: When delivery affects meaning.

[sarcastically] Oh, that's just great.
[whispering] Don't tell anyone.

Caption Quality Standards

Accuracy: Captions must accurately represent spoken content. Aim for 99%+ accuracy. Auto-generated captions typically achieve 80-90% accuracy and require editing.

Synchronization: Captions must appear in sync with audio. Industry standard: within 3 frames or 100 milliseconds of audio.

Readability:

  • Maximum 2 lines on screen at once
  • Maximum 32-42 characters per line
  • Minimum 1 second display time per caption
  • Position to avoid covering important visuals

Completeness: All meaningful audio must be captioned. No "inaudible" notations when content is audible but unclear—research or indicate uncertainty.

Caption File Formats

Common caption formats for web video:

WebVTT (.vtt): The web-native format. Recommended for HTML5 video.

WEBVTT

00:00:01.000 --> 00:00:04.000
Welcome to our accessibility tutorial.

00:00:04.500 --> 00:00:08.000
Today we'll cover captioning requirements.

SRT (.srt): Widely supported, simpler than WebVTT.

1
00:00:01,000 --> 00:00:04,000
Welcome to our accessibility tutorial.

2
00:00:04,500 --> 00:00:08,000
Today we'll cover captioning requirements.

TTML/DFXP: Used for broadcast and some streaming platforms.

Implementing Captions

HTML5 Video:

<video controls>
  <source src="tutorial.mp4" type="video/mp4">
  <track kind="captions"
         src="tutorial-captions.vtt"
         srclang="en"
         label="English"
         default>
  <track kind="captions"
         src="tutorial-captions-es.vtt"
         srclang="es"
         label="Spanish">
</video>

YouTube:

  • Upload .vtt or .srt files via YouTube Studio
  • Auto-generated captions available (require editing)
  • Caption settings accessible to viewers

Vimeo:

  • Upload caption files via video settings
  • Multiple language support
  • Styling options available

Caption Types

Closed Captions (CC): Can be turned on/off by viewers. Standard for web video.

Open Captions: Burned into video, always visible. Use when caption controls may be inaccessible or for social media autoplay.

SDH (Subtitles for Deaf and Hard of Hearing): Include speaker identification and sound descriptions. More comprehensive than standard subtitles.


Transcript Requirements

Transcripts provide text alternatives for audio and video content.

When Transcripts Are Required

Audio-only content (podcasts, audio recordings): Transcripts required at Level A.

Video with audio: Transcripts can satisfy Level A requirements when combined with captions. Required at Level AAA.

Video-only (no audio): Text description of visual content required at Level A.

What Transcripts Must Include

For audio content:

  • All spoken dialogue and narration
  • Speaker identification
  • Relevant sound effects
  • Musical cues when meaningful

For video content (descriptive transcripts): All of the above, plus:

  • Description of visual actions
  • On-screen text
  • Scene changes
  • Visual information not in audio

Transcript Format

Transcripts should be:

  • HTML text (not PDF or image)
  • Located near the video/audio
  • Clearly labeled
  • Searchable and copyable
<details>
  <summary>Video Transcript</summary>
  <div class="transcript">
    <p><strong>Sarah:</strong> Welcome to our accessibility tutorial.</p>
    <p><em>[Screen shows WCAG logo]</em></p>
    <p><strong>Sarah:</strong> Today we'll cover the requirements for video captions...</p>
  </div>
</details>

Interactive Transcripts

Enhanced transcripts allow users to click text to jump to that point in the video:

<div class="interactive-transcript">
  <p data-time="0">Welcome to our accessibility tutorial.</p>
  <p data-time="4.5">Today we'll cover captioning requirements.</p>
</div>

Audio Description Requirements

Audio description narrates visual information for blind users.

What Audio Description Covers

Visual actions: "Sarah walks to the whiteboard and writes 'WCAG 2.2' in large letters."

On-screen text: "Text appears: 'Three levels of conformance: A, AA, AAA.'"

Scene changes: "The scene shifts to an office meeting room."

Character appearances: "A man in a blue suit enters the room."

Non-verbal communication: "Sarah nods in agreement."

When Audio Description Is Required

Level A: Audio description OR full text alternative Level AA: Audio description is required

For most compliance scenarios (ADA, EAA), Level AA means audio description is necessary.

Creating Audio Description

Timing: Descriptions fit in natural pauses in dialogue/narration. If no pauses exist, extended audio description (Level AAA) pauses the video.

Content: Describe what's seen, not interpret. "Sarah points at the chart" not "Sarah seems excited about the data."

Voice: Distinct from main audio, clear and neutral.

Length: Brief and efficient—describe essential visual information.

Implementing Audio Description

Separate audio track:

<video controls>
  <source src="tutorial.mp4" type="video/mp4">
  <track kind="captions" src="captions.vtt" srclang="en" label="English">
  <track kind="descriptions"
         src="descriptions.vtt"
         srclang="en"
         label="Audio Descriptions">
</video>

Note: Browser support for description tracks is limited. Alternative approaches include:

Separate video version: Provide a version with audio description mixed into the main audio track.

Audio description service: Platforms like YouDescribe allow community-contributed descriptions.


Media Player Accessibility

The player itself must be accessible.

Keyboard Accessibility

Required keyboard controls:

  • Play/Pause (Space or Enter)
  • Volume (arrow keys)
  • Mute (M)
  • Full screen (F)
  • Seek (arrow keys or number keys)
  • Caption toggle (C)
  • Exit full screen (Escape)

Focus management:

  • All controls focusable via Tab
  • Visible focus indicators
  • Logical focus order

Screen Reader Accessibility

Control labeling:

<button aria-label="Play video">
  <span class="icon-play" aria-hidden="true"></span>
</button>

<button aria-label="Mute audio">
  <span class="icon-volume" aria-hidden="true"></span>
</button>

<button aria-label="Enable captions">
  <span class="icon-cc" aria-hidden="true"></span>
</button>

State communication:

<button aria-label="Pause video" aria-pressed="true">
  <span class="icon-pause" aria-hidden="true"></span>
</button>

<button aria-label="Enable captions" aria-pressed="false">
  <span class="icon-cc" aria-hidden="true"></span>
</button>

Progress communication: Current playback position should be programmatically available.

Caption Control

Users must be able to:

  • Toggle captions on/off
  • Select caption language
  • Access caption settings (size, color, background) where available

Autoplay Restrictions

WCAG 1.4.2 Audio Control (Level A): Audio that plays automatically for more than 3 seconds must have controls to pause/stop or control volume independently of system volume.

Best practice: Never autoplay video with audio. If video autoplays, mute audio by default.

Platform-Specific Implementation

YouTube embeds:

<iframe
  src="https://www.youtube.com/embed/VIDEO_ID?cc_load_policy=1"
  title="Video title for screen readers"
  allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope"
  allowfullscreen>
</iframe>

Key parameters:

  • cc_load_policy=1: Shows captions by default
  • title attribute: Required for iframe accessibility

Vimeo embeds:

<iframe
  src="https://player.vimeo.com/video/VIDEO_ID?texttrack=en"
  title="Video title for screen readers"
  allow="autoplay; fullscreen; picture-in-picture"
  allowfullscreen>
</iframe>

Live Media Accessibility

Live video and audio have specific requirements.

Live Captions (Level AA)

Live events require real-time captioning:

CART (Communication Access Realtime Translation): Human stenographers provide real-time captioning with 98%+ accuracy.

Automatic Speech Recognition (ASR): AI-powered real-time captions. Accuracy varies (85-95%). Requires clean audio input.

Live caption platforms:

  • Zoom built-in captions
  • Google Meet captions
  • Microsoft Teams captions
  • StreamText
  • 1CapApp

Live Audio Description

Not required by WCAG but beneficial for events with significant visual content (presentations, demonstrations).

Live Transcript (Level AAA)

Real-time text alternative for live audio-only content.


Common Media Accessibility Failures

Avoid these frequently encountered issues.

Auto-Generated Captions Without Editing

YouTube's automatic captions are a starting point, not a solution. They fail on:

  • Technical terminology
  • Accents
  • Multiple speakers
  • Background noise
  • Proper nouns

Always edit auto-generated captions for accuracy.

Captions Missing Non-Speech Audio

Captions that only capture dialogue miss:

  • Sound effects important to understanding
  • Musical cues
  • Background sounds that convey information

No Caption Controls

Embedded videos without visible caption toggle buttons leave users unable to enable captions.

Inaccessible Video Players

Custom video players that:

  • Lack keyboard controls
  • Have no focus indicators
  • Use unlabeled icon buttons
  • Hide controls without keyboard access

Autoplay With Audio

Videos that autoplay with audio:

  • Startle users
  • Conflict with screen readers
  • Violate WCAG 1.4.2 if no pause control

No Transcript Provided

Captions help real-time viewing, but transcripts enable:

  • Full-text search
  • Reading at own pace
  • Copy/paste content
  • Offline access
  • SEO benefits

Captioning Workflows

Establish efficient processes for captioning video content.

DIY Captioning

Process:

  1. Generate initial transcript (auto-transcription or manual)
  2. Edit for accuracy
  3. Add timing/synchronization
  4. Include non-speech audio
  5. Export to caption format
  6. Test with video

Tools:

  • YouTube Studio (free, auto-generates starting point)
  • Descript (AI transcription with editing)
  • Aegisub (free, open-source caption editor)
  • Subtitle Edit (free, Windows)

Professional Captioning Services

For high accuracy or high volume:

  • Rev ($1.50+/minute)
  • 3Play Media (enterprise)
  • Verbit (AI + human)
  • CaptionSync

Professional services achieve 99%+ accuracy and handle technical content reliably.

Caption Quality Assurance

Checklist:

  • [ ] Accuracy: Compare to audio, verify technical terms
  • [ ] Synchronization: Captions match audio timing
  • [ ] Speaker identification: Multiple speakers distinguished
  • [ ] Sound effects: Non-speech audio included
  • [ ] Readability: Line length and duration appropriate
  • [ ] Completeness: All meaningful audio captioned

Testing Media Accessibility

Verify media accessibility through systematic testing.

Caption Testing

  1. Play video with captions enabled
  2. Verify all dialogue is captured
  3. Check speaker identification accuracy
  4. Confirm sound effects are noted
  5. Verify synchronization
  6. Check formatting/readability

Player Accessibility Testing

  1. Navigate to video using keyboard only
  2. Test all controls via keyboard (play, pause, volume, seek, fullscreen, captions)
  3. Verify focus indicators on all controls
  4. Test with screen reader
  5. Verify controls are properly labeled

Transcript Testing

  1. Verify transcript exists and is linked
  2. Compare transcript to audio/video content
  3. Confirm visual descriptions are included (for video)
  4. Test transcript searchability

Taking Action

Media accessibility requires investment in captioning, transcripts, and audio descriptions—but the legal requirements are clear, and the benefits extend to all users.

Start by auditing existing video content for captions and transcripts. Establish captioning workflows for new content. Verify media player accessibility across your platform.

Schedule a TestParty demo and get a 14-day compliance implementation plan.


Stay informed

Accessibility insights delivered
straight to your inbox.

Contact Us

Automate the software work for accessibility compliance, end-to-end.

Empowering businesses with seamless digital accessibility solutions—simple, inclusive, effective.

Book a Demo