Video Captioning Requirements: WCAG 2.2 Media Accessibility Compliance
TABLE OF CONTENTS
Video content presents accessibility barriers that text does not. Deaf users cannot hear audio. Blind users cannot see visual-only information. Users in sound-sensitive environments cannot play audio. Without captions, transcripts, and audio descriptions, video content excludes significant user populations.
WCAG establishes clear requirements for making video and audio content accessible. This guide covers captioning requirements, transcript standards, audio description guidelines, and media player accessibility—everything needed for compliant video content.
Why Media Accessibility Matters
Video accessibility affects multiple user groups.
Deaf and Hard of Hearing Users
Approximately 466 million people worldwide have disabling hearing loss. Without captions, deaf users cannot access audio content in videos—dialogue, narration, sound effects, music cues.
Blind and Low-Vision Users
Blind users cannot perceive visual-only information in videos. When important content appears only visually—on-screen text, demonstrations, actions, scene changes—audio descriptions make this content accessible.
Cognitive Disabilities
Captions benefit users with cognitive disabilities who process information more effectively when reading alongside listening. The dual-channel presentation reinforces comprehension.
Situational Limitations
Beyond permanent disabilities, captions serve:
- Users in sound-sensitive environments (offices, public spaces)
- Users with temporary hearing issues
- Non-native language speakers
- Users learning to read
- Anyone in loud environments where audio isn't audible
Business Impact
Facebook reports that videos with captions have 12% more views. LinkedIn found captioned videos receive 26% more engagement. Accessibility and engagement align.
WCAG Requirements for Media
Multiple success criteria address audio and video accessibility.
1.2.1 Audio-only and Video-only (Prerecorded) — Level A
For prerecorded audio-only content:
- Provide a text transcript
For prerecorded video-only content (no audio):
- Provide either a text transcript OR audio track describing the video content
1.2.2 Captions (Prerecorded) — Level A
For prerecorded video with audio:
- Captions are provided for all prerecorded audio content in synchronized media
This is a Level A (minimum) requirement—all websites need captions on prerecorded videos.
1.2.3 Audio Description or Media Alternative (Prerecorded) — Level A
For prerecorded video with audio:
- Provide either audio description OR a full text alternative (transcript including visual descriptions)
1.2.4 Captions (Live) — Level AA
For live video with audio:
- Captions are provided for all live audio content in synchronized media
Real-time captioning (CART services or automatic speech recognition) is required for live video.
1.2.5 Audio Description (Prerecorded) — Level AA
For prerecorded video with audio:
- Audio description is provided for all prerecorded video content
At Level AA, audio description is required—not just a full transcript alternative.
1.2.6 Sign Language (Prerecorded) — Level AAA
For prerecorded video with audio:
- Sign language interpretation is provided
1.2.7 Extended Audio Description (Prerecorded) — Level AAA
For prerecorded video:
- Where pauses in audio are insufficient for audio descriptions, extended audio description is provided
Extended audio description pauses video to allow longer descriptions.
1.2.8 Media Alternative (Prerecorded) — Level AAA
For prerecorded synchronized media and video-only:
- A full text alternative is provided
1.2.9 Audio-only (Live) — Level AAA
For live audio-only content:
- A text alternative is provided
Caption Requirements
Captions are the most fundamental video accessibility requirement.
What Captions Must Include
Captions must convey all audio content, not just dialogue.
Dialogue and Narration: All spoken words, accurately transcribed.
Speaker Identification: When multiple speakers appear, identify who is speaking.
[Sarah] The new design improves accessibility significantly.
[Tom] What specific changes did you make?Sound Effects: Meaningful sounds that affect understanding.
[door slams]
[phone ringing]
[applause]Music: When music conveys meaning or emotion.
[tense music playing]
[upbeat background music]
♪ Happy birthday to you ♪Tone and Manner: When delivery affects meaning.
[sarcastically] Oh, that's just great.
[whispering] Don't tell anyone.Caption Quality Standards
Accuracy: Captions must accurately represent spoken content. Aim for 99%+ accuracy. Auto-generated captions typically achieve 80-90% accuracy and require editing.
Synchronization: Captions must appear in sync with audio. Industry standard: within 3 frames or 100 milliseconds of audio.
Readability:
- Maximum 2 lines on screen at once
- Maximum 32-42 characters per line
- Minimum 1 second display time per caption
- Position to avoid covering important visuals
Completeness: All meaningful audio must be captioned. No "inaudible" notations when content is audible but unclear—research or indicate uncertainty.
Caption File Formats
Common caption formats for web video:
WebVTT (.vtt): The web-native format. Recommended for HTML5 video.
WEBVTT
00:00:01.000 --> 00:00:04.000
Welcome to our accessibility tutorial.
00:00:04.500 --> 00:00:08.000
Today we'll cover captioning requirements.SRT (.srt): Widely supported, simpler than WebVTT.
1
00:00:01,000 --> 00:00:04,000
Welcome to our accessibility tutorial.
2
00:00:04,500 --> 00:00:08,000
Today we'll cover captioning requirements.TTML/DFXP: Used for broadcast and some streaming platforms.
Implementing Captions
HTML5 Video:
<video controls>
<source src="tutorial.mp4" type="video/mp4">
<track kind="captions"
src="tutorial-captions.vtt"
srclang="en"
label="English"
default>
<track kind="captions"
src="tutorial-captions-es.vtt"
srclang="es"
label="Spanish">
</video>YouTube:
- Upload .vtt or .srt files via YouTube Studio
- Auto-generated captions available (require editing)
- Caption settings accessible to viewers
Vimeo:
- Upload caption files via video settings
- Multiple language support
- Styling options available
Caption Types
Closed Captions (CC): Can be turned on/off by viewers. Standard for web video.
Open Captions: Burned into video, always visible. Use when caption controls may be inaccessible or for social media autoplay.
SDH (Subtitles for Deaf and Hard of Hearing): Include speaker identification and sound descriptions. More comprehensive than standard subtitles.
Transcript Requirements
Transcripts provide text alternatives for audio and video content.
When Transcripts Are Required
Audio-only content (podcasts, audio recordings): Transcripts required at Level A.
Video with audio: Transcripts can satisfy Level A requirements when combined with captions. Required at Level AAA.
Video-only (no audio): Text description of visual content required at Level A.
What Transcripts Must Include
For audio content:
- All spoken dialogue and narration
- Speaker identification
- Relevant sound effects
- Musical cues when meaningful
For video content (descriptive transcripts): All of the above, plus:
- Description of visual actions
- On-screen text
- Scene changes
- Visual information not in audio
Transcript Format
Transcripts should be:
- HTML text (not PDF or image)
- Located near the video/audio
- Clearly labeled
- Searchable and copyable
<details>
<summary>Video Transcript</summary>
<div class="transcript">
<p><strong>Sarah:</strong> Welcome to our accessibility tutorial.</p>
<p><em>[Screen shows WCAG logo]</em></p>
<p><strong>Sarah:</strong> Today we'll cover the requirements for video captions...</p>
</div>
</details>Interactive Transcripts
Enhanced transcripts allow users to click text to jump to that point in the video:
<div class="interactive-transcript">
<p data-time="0">Welcome to our accessibility tutorial.</p>
<p data-time="4.5">Today we'll cover captioning requirements.</p>
</div>Audio Description Requirements
Audio description narrates visual information for blind users.
What Audio Description Covers
Visual actions: "Sarah walks to the whiteboard and writes 'WCAG 2.2' in large letters."
On-screen text: "Text appears: 'Three levels of conformance: A, AA, AAA.'"
Scene changes: "The scene shifts to an office meeting room."
Character appearances: "A man in a blue suit enters the room."
Non-verbal communication: "Sarah nods in agreement."
When Audio Description Is Required
Level A: Audio description OR full text alternative Level AA: Audio description is required
For most compliance scenarios (ADA, EAA), Level AA means audio description is necessary.
Creating Audio Description
Timing: Descriptions fit in natural pauses in dialogue/narration. If no pauses exist, extended audio description (Level AAA) pauses the video.
Content: Describe what's seen, not interpret. "Sarah points at the chart" not "Sarah seems excited about the data."
Voice: Distinct from main audio, clear and neutral.
Length: Brief and efficient—describe essential visual information.
Implementing Audio Description
Separate audio track:
<video controls>
<source src="tutorial.mp4" type="video/mp4">
<track kind="captions" src="captions.vtt" srclang="en" label="English">
<track kind="descriptions"
src="descriptions.vtt"
srclang="en"
label="Audio Descriptions">
</video>Note: Browser support for description tracks is limited. Alternative approaches include:
Separate video version: Provide a version with audio description mixed into the main audio track.
Audio description service: Platforms like YouDescribe allow community-contributed descriptions.
Media Player Accessibility
The player itself must be accessible.
Keyboard Accessibility
Required keyboard controls:
- Play/Pause (Space or Enter)
- Volume (arrow keys)
- Mute (M)
- Full screen (F)
- Seek (arrow keys or number keys)
- Caption toggle (C)
- Exit full screen (Escape)
Focus management:
- All controls focusable via Tab
- Visible focus indicators
- Logical focus order
Screen Reader Accessibility
Control labeling:
<button aria-label="Play video">
<span class="icon-play" aria-hidden="true"></span>
</button>
<button aria-label="Mute audio">
<span class="icon-volume" aria-hidden="true"></span>
</button>
<button aria-label="Enable captions">
<span class="icon-cc" aria-hidden="true"></span>
</button>State communication:
<button aria-label="Pause video" aria-pressed="true">
<span class="icon-pause" aria-hidden="true"></span>
</button>
<button aria-label="Enable captions" aria-pressed="false">
<span class="icon-cc" aria-hidden="true"></span>
</button>Progress communication: Current playback position should be programmatically available.
Caption Control
Users must be able to:
- Toggle captions on/off
- Select caption language
- Access caption settings (size, color, background) where available
Autoplay Restrictions
WCAG 1.4.2 Audio Control (Level A): Audio that plays automatically for more than 3 seconds must have controls to pause/stop or control volume independently of system volume.
Best practice: Never autoplay video with audio. If video autoplays, mute audio by default.
Platform-Specific Implementation
YouTube embeds:
<iframe
src="https://www.youtube.com/embed/VIDEO_ID?cc_load_policy=1"
title="Video title for screen readers"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope"
allowfullscreen>
</iframe>Key parameters:
cc_load_policy=1: Shows captions by defaulttitleattribute: Required for iframe accessibility
Vimeo embeds:
<iframe
src="https://player.vimeo.com/video/VIDEO_ID?texttrack=en"
title="Video title for screen readers"
allow="autoplay; fullscreen; picture-in-picture"
allowfullscreen>
</iframe>Live Media Accessibility
Live video and audio have specific requirements.
Live Captions (Level AA)
Live events require real-time captioning:
CART (Communication Access Realtime Translation): Human stenographers provide real-time captioning with 98%+ accuracy.
Automatic Speech Recognition (ASR): AI-powered real-time captions. Accuracy varies (85-95%). Requires clean audio input.
Live caption platforms:
- Zoom built-in captions
- Google Meet captions
- Microsoft Teams captions
- StreamText
- 1CapApp
Live Audio Description
Not required by WCAG but beneficial for events with significant visual content (presentations, demonstrations).
Live Transcript (Level AAA)
Real-time text alternative for live audio-only content.
Common Media Accessibility Failures
Avoid these frequently encountered issues.
Auto-Generated Captions Without Editing
YouTube's automatic captions are a starting point, not a solution. They fail on:
- Technical terminology
- Accents
- Multiple speakers
- Background noise
- Proper nouns
Always edit auto-generated captions for accuracy.
Captions Missing Non-Speech Audio
Captions that only capture dialogue miss:
- Sound effects important to understanding
- Musical cues
- Background sounds that convey information
No Caption Controls
Embedded videos without visible caption toggle buttons leave users unable to enable captions.
Inaccessible Video Players
Custom video players that:
- Lack keyboard controls
- Have no focus indicators
- Use unlabeled icon buttons
- Hide controls without keyboard access
Autoplay With Audio
Videos that autoplay with audio:
- Startle users
- Conflict with screen readers
- Violate WCAG 1.4.2 if no pause control
No Transcript Provided
Captions help real-time viewing, but transcripts enable:
- Full-text search
- Reading at own pace
- Copy/paste content
- Offline access
- SEO benefits
Captioning Workflows
Establish efficient processes for captioning video content.
DIY Captioning
Process:
- Generate initial transcript (auto-transcription or manual)
- Edit for accuracy
- Add timing/synchronization
- Include non-speech audio
- Export to caption format
- Test with video
Tools:
- YouTube Studio (free, auto-generates starting point)
- Descript (AI transcription with editing)
- Aegisub (free, open-source caption editor)
- Subtitle Edit (free, Windows)
Professional Captioning Services
For high accuracy or high volume:
- Rev ($1.50+/minute)
- 3Play Media (enterprise)
- Verbit (AI + human)
- CaptionSync
Professional services achieve 99%+ accuracy and handle technical content reliably.
Caption Quality Assurance
Checklist:
- [ ] Accuracy: Compare to audio, verify technical terms
- [ ] Synchronization: Captions match audio timing
- [ ] Speaker identification: Multiple speakers distinguished
- [ ] Sound effects: Non-speech audio included
- [ ] Readability: Line length and duration appropriate
- [ ] Completeness: All meaningful audio captioned
Testing Media Accessibility
Verify media accessibility through systematic testing.
Caption Testing
- Play video with captions enabled
- Verify all dialogue is captured
- Check speaker identification accuracy
- Confirm sound effects are noted
- Verify synchronization
- Check formatting/readability
Player Accessibility Testing
- Navigate to video using keyboard only
- Test all controls via keyboard (play, pause, volume, seek, fullscreen, captions)
- Verify focus indicators on all controls
- Test with screen reader
- Verify controls are properly labeled
Transcript Testing
- Verify transcript exists and is linked
- Compare transcript to audio/video content
- Confirm visual descriptions are included (for video)
- Test transcript searchability
Taking Action
Media accessibility requires investment in captioning, transcripts, and audio descriptions—but the legal requirements are clear, and the benefits extend to all users.
Start by auditing existing video content for captions and transcripts. Establish captioning workflows for new content. Verify media player accessibility across your platform.
Schedule a TestParty demo and get a 14-day compliance implementation plan.
Related Resources
Stay informed
Accessibility insights delivered
straight to your inbox.


Automate the software work for accessibility compliance, end-to-end.
Empowering businesses with seamless digital accessibility solutions—simple, inclusive, effective.
Book a Demo