Experiment Without Excluding: How to Run A/B Tests That Respect Accessibility
TABLE OF CONTENTS
- Growth vs. Inclusion Doesn't Have to Be a Tradeoff
- Common Accessibility Risks in A/B Tests
- Embedding Accessibility into Experiment Design
- Testing Experiments for Accessibility Before Rollout
- Measuring Impact on Disabled Users
- How TestParty Can Scan Experiment Branches and Flags
- Frequently Asked Questions
- Conclusion – Growth Experiments That Don't Throw Users Overboard
A/B testing accessibility is often overlooked in the race for conversion optimization. Growth teams ship experimental variations designed to lift metrics, but these experiments frequently introduce accessibility barriers that exclude users with disabilities from the test population entirely—or worse, from using the product at all.
This creates a troubling dynamic: organizations optimize experiences for users who can access them while ignoring users who can't. The resulting "improvements" may actually harm overall accessibility posture, and the metrics driving decisions are systematically biased toward users without disabilities.
The good news: experiment accessibility doesn't require choosing between growth and inclusion. Accessible A/B test strategies simply mean applying the same accessibility standards to experimental variations that you apply (or should apply) to production code. This guide covers how to embed accessibility into experiment design, QA, and measurement—so your optimization program respects all users.
Growth vs. Inclusion Doesn't Have to Be a Tradeoff
The Hidden Exclusion Problem
What is accessible A/B testing? Accessible A/B testing ensures that all experimental variations—including control and treatment groups—meet accessibility standards, so users with disabilities can participate in experiments and aren't excluded from improved experiences.
When growth teams run experiments, they typically measure:
- Click-through rates
- Conversion rates
- Time on page
- Funnel completion
But these metrics exclude users who couldn't interact with the experiment at all. If a pop-up variant traps keyboard users, they'll either struggle through or abandon—neither outcome shows up as "pop-up caused accessibility problem."
According to CDC data, 27% of US adults have some form of disability. If your experiments systematically exclude this population, your "winning" variants may actually be worse for overall user experience—you just can't see it in the data.
The Business Case for Inclusive Experiments
Accessible experimentation isn't just ethical—it's better science:
Larger sample sizes: Including users with disabilities increases your test population, improving statistical significance and reducing time to results.
More representative data: Experiments that exclude user segments produce biased results. Winners may only win for a subset of your actual users.
Reduced legal risk: Shipping inaccessible experiments creates the same legal exposure as shipping any inaccessible code.
Better actual outcomes: Accessibility improvements often improve usability for everyone. Inclusive experiments may reveal that accessible variants outperform inaccessible ones.
Common Accessibility Risks in A/B Tests
Pop-Ups and Interstitials
Pop-ups are growth team favorites—email capture, promotions, exit intent. They're also accessibility nightmares:
Focus management failures: Pop-up appears but focus stays on the underlying page. Screen reader users don't know the pop-up exists; keyboard users can't interact with it.
Focus traps: Pop-up captures focus but provides no way to close with keyboard (missing Escape handler, close button not keyboard-focusable).
No announcements: Pop-up content not announced to screen readers. Assistive technology users experience a silent takeover of their viewport.
Obscured content: WCAG 2.4.11 Focus Not Obscured requires that focused elements remain visible. Pop-ups covering focused content violate this.
Timing issues: Pop-ups appearing based on time or scroll position may interrupt screen reader users mid-sentence.
New Navigation Patterns
Growth teams test navigation variations to improve findability:
Hamburger menus on desktop: Hiding navigation behind a menu icon may reduce accessibility by removing visible landmarks.
Gesture-only navigation: Swipe carousels, pull-to-reveal menus, and gesture-dependent interactions exclude users who can't perform gestures.
Mega-menus: Complex dropdown menus with multiple columns often have poor keyboard navigation and screen reader support.
Sticky navigation changes: Modified sticky headers may obscure focused content or change the interaction model users expect.
Motion and Micro-Animations
Animation experiments introduce accessibility risks:
Vestibular triggers: Motion-heavy variations can cause dizziness, nausea, or disorientation for users with vestibular disorders. WCAG 2.3.3 Animation from Interactions addresses this.
Cognitive load: Animated elements can distract users with attention disorders, making content harder to consume.
Prefers-reduced-motion ignored: Variations that don't respect the prefers-reduced-motion media query force motion on users who've explicitly opted out.
Auto-playing content: Video or animated content that plays automatically without user control violates WCAG 2.2.2 Pause, Stop, Hide.
Embedding Accessibility into Experiment Design
Adding Accessibility to Hypothesis and Success Criteria
How do you make A/B tests accessible? Include accessibility requirements in experiment design, use pre-approved accessible patterns, test variations with keyboard and screen reader before launch, and measure accessibility impact alongside conversion metrics.
Reframe experiment success criteria:
Traditional hypothesis: "Adding a sticky CTA banner will increase signups by 10%."
Accessible hypothesis: "Adding a sticky CTA banner will increase signups by 10% without reducing completion rates for keyboard or screen reader users."
Success criteria additions:
- All experimental variations pass automated accessibility scans
- Manual accessibility QA confirms keyboard and screen reader compatibility
- No increase in accessibility-related support contacts
- Completion rates don't decline for users with accessibility settings enabled
Guardrails for Designers and Engineers
Establish pre-approved patterns for common experiment types:
Pop-up/modal pattern: Focus management required. Escape closes. Screen reader announcements. Focus returns to trigger on close.
Navigation pattern: All menu items keyboard accessible. ARIA attributes for expanded/collapsed state. Focus visible throughout.
Animation pattern: Respects prefers-reduced-motion. No auto-play without pause control. No content changes faster than 3 per second.
Form variation pattern: All fields labeled. Error messages associated with fields. Clear instructions for required fields.
When experiment designers work from accessible pattern libraries, they can innovate within safe boundaries rather than reinventing (and breaking) accessibility with each test.
Experiment Accessibility Checklist
Before any variation launches:
Structure:
- [ ] Heading hierarchy maintained
- [ ] Landmark regions intact
- [ ] Reading order logical
Interaction:
- [ ] All interactive elements keyboard accessible
- [ ] Focus order follows visual layout
- [ ] Focus indicators visible
- [ ] Focus management correct for modals/overlays
Content:
- [ ] Text meets contrast requirements
- [ ] Images have appropriate alt text
- [ ] Link text descriptive
- [ ] Instructions clear and visible
Motion:
- [ ] Respects
prefers-reduced-motion - [ ] Auto-play has pause control
- [ ] No content flashing more than 3 times per second
Testing Experiments for Accessibility Before Rollout
Pre-Launch QA Process
Automated scanning: Run accessibility scans on all variations before launch. TestParty integrates with CI/CD to scan feature branches, catching issues before code reaches production.
Keyboard testing: Navigate each variation using only keyboard. Verify all interactions work, focus is visible, and there are no traps.
Screen reader testing: Test each variation with VoiceOver or NVDA. Verify content is announced correctly, dynamic changes are communicated, and interactions are clear.
Reduced motion testing: Enable prefers-reduced-motion in browser settings. Verify animations are eliminated or reduced appropriately.
Document findings: Create accessibility acceptance criteria specific to each experiment. Record testing results.
Feature Flag Integration
Modern experimentation platforms use feature flags. Integrate accessibility:
Scan all flag variations: When feature flags control experimental code paths, scan each path for accessibility.
A/B test accessibility: Don't just scan control—scan all treatment groups. The variant you're testing may have different accessibility characteristics.
Gradual rollout consideration: As experiments roll out to larger percentages, accessibility issues affect more users. Catch problems at low rollout percentages.
Monitoring Once Live
Accessibility monitoring continues after launch:
Error rate tracking: Monitor for increased errors or timeouts that might indicate accessibility barriers.
Support ticket analysis: Watch for contacts mentioning difficulty with new features or changed interactions.
Feedback channels: Provide mechanisms for users to report accessibility issues with experimental features.
Session analysis: Where available, review session recordings for signs of struggle—repeated clicks, navigation confusion, rage clicks.
Measuring Impact on Disabled Users
Segmenting Experiment Data
Directly measuring accessibility impact requires identifying users:
Accessibility settings proxies: Some platforms can identify users with OS-level accessibility settings enabled, though this data is limited and privacy-sensitive.
Keyboard-only sessions: Sessions with no mouse events may indicate keyboard-only users.
Assistive technology detection: Some analytics can detect screen reader usage, though this raises privacy considerations.
Self-identification: Optional accessibility preference settings allow users to indicate needs.
Data segmentation has limitations. Not all users with disabilities use detectable settings or assistive technologies. Privacy regulations may restrict data collection. Use segmentation as directional signal, not definitive measurement.
Alternative Measurement Approaches
When direct segmentation isn't feasible:
Accessibility-focused user testing: Include users with disabilities in usability testing of experimental variations.
Manual accessibility audits: Expert evaluation of variations before and during rollout.
Support contact analysis: Qualitative review of support interactions for accessibility-related themes.
Accessibility metric tracking: Track overall accessibility scores and watch for degradation during experiment periods.
Interpreting Results Inclusively
Challenge results that might hide accessibility problems:
Overall lifts masking segment harm: A 5% conversion lift overall might come from 8% improvement for mouse users and 10% decline for keyboard users. The decline is hidden in aggregate.
Selection effects: If inaccessible variations cause users with disabilities to abandon, they're no longer in your test population. Your results are biased toward users who could complete the flow.
Long-term versus short-term: Short-term conversion lifts from aggressive tactics might cost long-term trust and accessibility reputation.
How TestParty Can Scan Experiment Branches and Flags
Scanning Feature Branches in CI
TestParty integrates with development workflow to catch experiment accessibility issues early:
Branch-level scanning: When experiments are developed in feature branches, TestParty scans those branches in CI, identifying issues before merge.
PR comments: Accessibility findings appear directly in pull request comments, enabling developers to fix issues before experiments launch.
Blocking deployments: Configure CI to prevent deployment of experiments with critical accessibility issues.
Monitoring Live Variants
For experiments in production:
Multi-variant scanning: TestParty can scan different experimental variations based on feature flag states, identifying accessibility differences between control and treatment.
Regression detection: If an experiment introduces accessibility regressions, TestParty catches them through continuous monitoring.
Comparison reporting: See accessibility status of different variations side-by-side to evaluate which approaches are more accessible.
Experiment-Specific Dashboards
Track accessibility across your experimentation program:
Experiments scanned: What percentage of experiments receive accessibility evaluation?
Issues by experiment: Which experiments introduced accessibility problems?
Fix rates: How quickly are experiment-related accessibility issues resolved?
Pattern analysis: What types of experiments tend to cause accessibility problems?
Frequently Asked Questions
Do we need to accessibility test every experiment?
Yes. Every experiment that changes user interface has potential accessibility impact. The question is level of testing—automated scans for all experiments, with more thorough manual testing for experiments involving navigation, forms, modals, or significant UI changes. Consistent baseline testing prevents accessibility regressions regardless of experiment type.
What if accessibility slows down our experimentation velocity?
Accessibility testing adds some time, but it's minimal with good tooling and processes. Automated scanning takes minutes. If accessibility consistently slows experiments, that usually indicates underlying component problems—fix the components, and accessibility testing becomes routine verification rather than discovery of new issues.
How do we handle experiments that show lifts but have accessibility issues?
Don't ship inaccessible variants to production, even if they show conversion lifts. Fix the accessibility issues first, then validate that the fix maintains the lift. Often, the accessible version performs equally well or better. If fixing accessibility truly reduces the metric, you need to decide whether the lift justifies excluding users with disabilities.
Should users with disabilities be excluded from experiments?
No. Excluding users with disabilities from experiments is both unethical and produces biased results. Instead, ensure all experimental variations are accessible so everyone can participate. If you're testing a feature that might affect accessibility, that's more reason to include users with disabilities, not less.
What if our experimentation platform doesn't support accessibility scanning?
Use external tools. TestParty and similar platforms can scan live variations regardless of which experimentation platform you use. Integrate accessibility scanning into your CI/CD pipeline where experimental code is developed. The experimentation platform delivers variations; accessibility testing happens in your development and monitoring processes.
Conclusion – Growth Experiments That Don't Throw Users Overboard
Experiment accessibility isn't a constraint on growth—it's a requirement for legitimate growth. Experiments that exclude users with disabilities produce biased metrics, create legal exposure, and optimize for a subset of your actual user base.
Inclusive experimentation means:
- Accessible variations by design, using pre-approved patterns and component libraries
- Accessibility criteria in experiment hypotheses and success metrics
- Pre-launch testing with automated scans, keyboard testing, and screen reader verification
- Live monitoring for accessibility issues after experiments launch
- Inclusive measurement that considers impact on all user segments, not just aggregates
The organizations winning at experimentation aren't those running the most tests—they're those running tests that work for all users and produce valid, actionable insights.
Running lots of experiments? Book a demo with TestParty to set up automated accessibility checks on your feature flags and variants.
Related Articles:
Stay informed
Accessibility insights delivered
straight to your inbox.


Automate the software work for accessibility compliance, end-to-end.
Empowering businesses with seamless digital accessibility solutions—simple, inclusive, effective.
Book a Demo