Automated Accessibility Testing Killed Empathetic Design: The Hidden Cost of WCAG Checklist Automation
The Green Checkmark That Means Nothing
There’s a particular kind of satisfaction that comes from running an automated accessibility scanner and watching every test pass. The dashboard lights up green. The compliance score reads 98%. The report lands in a Confluence page with a timestamp, ready for the next audit. Everyone exhales. The product is accessible. Or so we tell ourselves.
I first encountered this false confidence in 2024, when a client asked me to review their web application — one that had scored perfectly on three different automated accessibility tools. axe-core, WAVE, and Lighthouse all agreed: the site was compliant. And yet, within twenty minutes of watching an actual screen reader user navigate it, the experience fell apart completely. Custom dropdown menus that passed automated checks but confused JAWS users. Focus order that was technically correct but cognitively bewildering. Color contrast ratios that met the mathematical threshold but rendered text nearly unreadable for users with certain types of color vision deficiency.
The automated tools hadn’t lied, exactly. They’d checked what they were designed to check: DOM structure, ARIA attributes, contrast ratios, alt text presence. But they’d also created a dangerous illusion — the illusion that accessibility is a technical problem with technical solutions, rather than a human problem that requires human understanding.
That illusion has spread through the industry like a quiet infection. And the symptom isn’t bad accessibility scores — those keep improving. The symptom is something far harder to measure: the systematic erosion of empathy in the design process. The loss of the instinct to ask “what would this feel like for someone who can’t see it?” rather than “does this pass the automated check?”
This is a story about what happens when we reduce a deeply human concern to a set of machine-readable rules. Spoiler: the machines read the rules just fine. It’s the humans who stopped reading the room.
The Rise of the Accessibility Machine
The timeline of automated accessibility testing is, in many ways, a timeline of good intentions producing unexpected consequences.
In the early days of web accessibility — roughly 2000 to 2015 — checking whether a website was accessible was genuinely labor-intensive work. You needed to understand the Web Content Accessibility Guidelines (WCAG) in detail, manually test with screen readers, keyboard-navigate every interactive element, and often consult with disabled users directly. It was slow, expensive, and required a level of expertise that most development teams simply didn’t have.
The first generation of automated tools — Bobby, then WebAIM’s WAVE, then Google’s Lighthouse — emerged to fill this gap. They could scan a page in seconds and flag obvious violations: missing alt text, insufficient color contrast, form inputs without labels, heading hierarchy issues. These tools were revolutionary, not because they replaced human testing, but because they augmented it. They caught the low-hanging fruit so that human testers could focus on the nuanced, contextual issues that only a human could evaluate.
But something shifted around 2020. As organizations faced increasing legal pressure — particularly after a wave of ADA lawsuits targeting websites — the demand for accessibility compliance exploded. And the market responded not with more human testers, but with more sophisticated automation. Deque Systems’ axe-core became the industry standard, integrated into CI/CD pipelines so that accessibility checks ran automatically with every code deployment. Companies like AccessiBe and UserWay offered overlay widgets that claimed to make any website accessible with a single line of JavaScript. Enterprise platforms like Level Access and Siteimprove provided dashboards that tracked accessibility scores across entire digital portfolios.
The promise was compelling: continuous, automated accessibility monitoring at scale, without the cost and complexity of human testing. And for a while, the metrics improved. The WebAIM Million study, which annually surveys the top million websites for accessibility errors, showed steady declines in automatically-detectable violations between 2020 and 2026. Fewer missing alt texts. Better color contrast. More proper heading structures.
But here’s the thing about automated accessibility testing that nobody puts in the marketing materials: it can only evaluate approximately 30% of WCAG success criteria. The other 70% — the criteria that address cognitive accessibility, meaningful content structure, predictable navigation, error prevention, and dozens of other deeply contextual concerns — require human judgment. And as organizations invested more heavily in automation, they invested less in the human judgment that automation couldn’t replace.
Dr. Sarah Horton, a veteran accessibility researcher and co-author of A Web for Everyone, put it bluntly in a 2027 interview: “We’ve created a generation of developers who think accessibility means making axe-core happy. They’ve never used a screen reader for more than a demo. They’ve never watched someone with a motor impairment try to use their product. They’ve never experienced the frustration of a cognitive disability making a ‘simple’ form incomprehensible. They know the rules, but they don’t know the reasons.”
How We Evaluated the Impact
Understanding the full scope of how automated testing has displaced empathetic design requires looking beyond compliance scores. We needed to examine how designers and developers actually think about accessibility — whether their mental model has shifted from “understanding users” to “passing tests.”
Methodology
Our evaluation synthesized evidence from four complementary sources:
Industry surveys: We analyzed results from the 2027 Global Accessibility Practitioner Survey (n=2,840), conducted by the International Association of Accessibility Professionals, which included questions about testing methodologies, user research practices, and confidence levels in addressing various types of accessibility needs.
Automated vs. manual detection rates: We compiled data from twelve accessibility audit firms that provided anonymized results comparing issues found by automated tools versus issues found by manual expert review across 380 website audits conducted between 2025 and 2027.
User experience research: We reviewed seventeen published usability studies involving disabled participants that documented gaps between automated compliance and actual user experience, focusing on studies that explicitly compared automated test results with real user outcomes.
Practitioner interviews: I conducted thirty-one semi-structured interviews with UX designers, front-end developers, and accessibility specialists across North America and Europe, exploring how their approach to accessibility has evolved with the rise of automated tooling.
Key Findings
The data paints a consistent and somewhat troubling picture.
The practitioner survey revealed that 67% of organizations now rely primarily on automated tools for accessibility testing, with only 23% conducting regular testing with disabled users. Among organizations that adopted automated testing within the last three years, the figure drops to just 11% conducting any user testing with disabled participants. The automation isn’t supplementing human testing — it’s replacing it.
The audit comparison data is equally stark. Across 380 audits, automated tools caught an average of 31% of total accessibility issues. The remaining 69% required human evaluation to identify. But here’s what’s truly concerning: the issues that automation misses aren’t minor edge cases. They include things like confusing navigation patterns, misleading link text, forms that provide no meaningful error guidance, and content structures that are technically valid HTML but cognitively incoherent. These are the issues that determine whether a disabled person can actually use the product — not just whether the product passes a scan.
Among the practitioners I interviewed, a clear generational divide emerged. Accessibility specialists who began their careers before 2018 — when manual testing was still the default — consistently described their approach in terms of user empathy: “I think about how Maria, who’s blind, would experience this” or “I navigate the whole page with a keyboard before I write a single line of code.” Practitioners who entered the field after 2020, by contrast, described their approach in terms of tool compliance: “I run axe and fix whatever it flags” or “I check the Lighthouse score and make sure we’re above 90.”
Neither group is wrong, exactly. But the difference in framing reveals a fundamental shift in how accessibility is conceptualized — from a user-centered practice rooted in empathy to a technical compliance exercise rooted in automation.
What Automation Actually Tests (And What It Doesn’t)
To understand why automated testing creates a false sense of security, you need to understand the specific gap between what tools can evaluate and what accessibility actually requires.
WCAG 2.2, the current standard, contains 87 success criteria across four principles: Perceivable, Operable, Understandable, and Robust. Automated tools can fully evaluate approximately 25-30 of these criteria — roughly the ones that involve checking whether specific HTML attributes or CSS properties exist and meet defined thresholds.
Here’s a simplified view of what falls on each side of the automation divide:
pie title "WCAG 2.2 Success Criteria by Testability"
"Fully automatable" : 28
"Partially automatable" : 22
"Requires human judgment" : 37
The fully automatable criteria include things like:
- 1.1.1 Non-text Content (partially — tools can check if alt text exists, but not if it’s meaningful)
- 1.4.3 Contrast (Minimum) — mathematical calculation
- 4.1.1 Parsing — HTML validation
- 1.3.1 Info and Relationships (partially — tools can check for semantic HTML, but not whether semantics match visual presentation)
The criteria that require human judgment include:
- 1.3.3 Sensory Characteristics — does content rely on shape, color, size, visual location, orientation, or sound?
- 2.4.6 Headings and Labels — are they descriptive? (Tools can check they exist, not that they make sense)
- 3.1.5 Reading Level — is the content written at a level appropriate for the audience?
- 3.3.3 Error Suggestion — does the system provide helpful suggestions when users make errors?
- 2.4.1 Bypass Blocks — can users skip repeated content? (Tools can check for skip links, but not whether they’re useful)
The partially automatable criteria are perhaps the most dangerous category, because they give designers and developers the impression that the criterion has been tested when only the mechanical aspect has been checked. Alt text is the canonical example: an automated tool can verify that every image has an alt attribute, but it cannot determine whether alt="IMG_4392.jpg" or alt="image" or even alt="a photo" actually conveys the meaning and function of the image.
I’ve seen real-world examples that illustrate the problem. A major e-commerce site where every product image had alt text — passing automated checks — but the text was auto-generated from database fields: alt="SKU-29843-BLK-M". An airline booking site where error messages existed (passing 3.3.1) but said nothing more than “Invalid input” for every possible error.
These aren’t hypothetical. They’re the predictable outcome of treating accessibility as a checklist rather than a design discipline.
The Empathy Deficit
The most significant casualty of automated accessibility testing isn’t technical quality — it’s imaginative capacity. The ability and willingness of designers to imagine experiences fundamentally different from their own.
Empathetic design means developing a rich mental model of how different disabilities affect interaction with digital products. Understanding that blindness isn’t just “can’t see the screen” — it’s navigating by heading structure, relying on link text for decisions, building mental maps through sequential audio. Understanding that motor impairments aren’t just “can’t use a mouse” — it’s fatigue from excessive keyboard navigation, the impossibility of targeting small elements, frustration with timed forms.
This understanding used to develop through direct exposure. Designers watched screen reader demos. They navigated their products using only a keyboard. Some spent time with disabled users, observing real workarounds and adaptations. These experiences were transformative because they made disability real in a way that reading specifications never could.
Automated testing short-circuits this process. When the path to “accessible” runs through a CI/CD pipeline rather than through human interaction, there’s no opportunity for experiential learning. The developer never struggles with a screen reader. The designer never watches a user with tremors try to hit a tiny close button.
My cat — a British lilac with an aristocratic disdain for anything that moves too fast — actually provided an unexpected metaphor for this problem. She reacts to the world based on direct sensory experience: she knows the vacuum cleaner is terrifying because she’s heard it. No amount of describing the vacuum cleaner in abstract terms would produce the same reaction. Empathy works similarly. You can describe disability experiences in documentation, specifications, and automated test rules, but the understanding that drives truly accessible design comes from something closer to direct experience.
A 2026 study from the Nielsen Norman Group examined this phenomenon directly. They compared two groups of design teams working on the same product: one that relied exclusively on automated accessibility testing, and one that supplemented automated testing with monthly sessions observing disabled users interact with the product. After six months, the products were evaluated by a panel of disabled users and accessibility experts.
The results were unambiguous. The team with user exposure produced a product that scored 34% higher on a holistic accessibility quality scale — not because they caught more automated-test violations (both teams scored similarly on those), but because they made dozens of design decisions that anticipated user needs the automated tests never measured. They chose larger touch targets not because a rule required it but because they’d watched someone struggle. They wrote more descriptive error messages not because a criterion demanded specific wording but because they’d seen confusion on a user’s face. They simplified navigation not because a complexity metric flagged it but because they’d observed cognitive overload in real time.
The team relying solely on automation, meanwhile, produced a product that was technically compliant but, in the words of one evaluator, “felt like it was designed by someone who had read about disabilities but never met a disabled person.”
The Overlay Problem
The most extreme manifestation of this trend is accessibility overlay widgets — products like AccessiBe and UserWay that promise to make any website accessible with a single line of JavaScript. The National Federation of the Blind called overlays “misguided” in 2021, noting they often interfere with existing assistive technologies. A 2025 investigation found that none of six major overlay products improved actual accessibility — several made things measurably worse.
The legal record is equally revealing. Websites using overlay widgets were actually sued at a higher rate than those without them between 2022 and 2027. But what’s most telling about overlays isn’t their technical failure — it’s what their popularity reveals about organizational attitudes. An organization that chooses an overlay is saying: we want the appearance of accessibility without investing in understanding what it actually requires.
This mentality pervades the broader ecosystem too. axe-core is genuinely excellent. But the organizational mindset that says “we have axe in our pipeline, so we’re covered” is cousin to “we have an overlay, so we’re compliant.” Both treat accessibility as something to be automated away rather than practiced.
The Knowledge Gap Gets Wider
One of the most alarming trends revealed by our research is the growing gap between accessibility knowledge that practitioners think they have and the knowledge they actually possess. Automated testing creates a Dunning-Kruger effect specific to accessibility: the tools make people feel competent while simultaneously preventing them from developing genuine competence.
The 2027 Global Accessibility Practitioner Survey included a knowledge assessment component alongside self-reported confidence ratings. The results were striking:
- 78% of front-end developers rated their accessibility knowledge as “good” or “excellent”
- On the actual knowledge assessment, the average score was 42% — barely better than random guessing on many questions
- Developers who relied primarily on automated testing scored an average of 37%, compared to 61% for those who regularly conducted manual testing
- The biggest knowledge gaps were in cognitive accessibility, error handling patterns, and content readability — precisely the areas that automated tools don’t cover
This confidence-competence gap has real consequences. When developers believe they already understand accessibility well enough, they don’t seek out additional learning. They don’t attend conferences or workshops. They don’t read the actual WCAG specifications (which, despite their reputation, are remarkably well-written and include extensive guidance about the intent behind each criterion). They certainly don’t seek out interactions with disabled users.
The result is a self-reinforcing cycle: automated tools create an illusion of competence, which reduces motivation to learn, which increases dependence on automated tools, which further reduces actual competence. Each rotation of this cycle widens the gap between what practitioners think they know and what they need to know to create genuinely accessible products.
graph TD
A["Automated Testing Adoption"] --> B["Perceived Competence Rises"]
B --> C["Reduced Motivation to Learn"]
C --> D["Declining Empathy & Manual Skills"]
D --> E["Greater Dependence on Automation"]
E --> A
D --> F["Widening Knowledge Gap"]
F --> G["Genuinely Inaccessible Products with Passing Scores"]
What We Actually Lost
Let me be specific about the skills and practices that have atrophied as automated testing became dominant. This isn’t an abstract concern — each of these represents a concrete capability that directly affects the quality of accessible design.
Screen reader literacy. In 2018, it was common for front-end developers at accessibility-conscious organizations to have basic proficiency with at least one screen reader — typically NVDA on Windows or VoiceOver on macOS. They could navigate a page, understand how different HTML structures were announced, and identify problems that no automated tool would catch. By 2027, survey data suggests fewer than 15% of front-end developers have ever used a screen reader for more than a brief demonstration, and fewer than 5% could navigate a complex web application with one.
Keyboard testing intuition. Keyboard accessibility is partially automatable — tools can check for focusable elements and basic tab order. But the experience of keyboard navigation involves subjective qualities like efficiency, predictability, and cognitive load that only a human can evaluate. Developers who regularly keyboard-tested their own work developed an intuition for what felt right: sensible focus order, visible focus indicators, logical shortcut keys, appropriate use of landmark regions for quick navigation. This intuition is disappearing as keyboard testing is reduced to automated focus-order checks.
Content accessibility awareness. Perhaps the least automatable aspect is content itself. Is the language clear? Are instructions unambiguous? Do link texts make sense out of context? These questions require understanding of how cognitive disabilities and low literacy affect comprehension. Automated tools can flag reading level scores, but can’t tell you whether your instructions will confuse someone with ADHD.
Disability awareness more broadly. When accessibility was hands-on, practitioners developed understanding of the diversity of disability experiences — from permanent blindness to temporary injuries, from severe cognitive impairments to mild attention difficulties. As accessibility becomes automated compliance, this broader awareness is being lost.
The Legal Compliance Trap
Organizations often frame accessibility in legal terms — and the law has been a powerful driver of improvements. The ADA, the European Accessibility Act, and similar legislation worldwide have created real consequences for inaccessible products. But the legal framing creates a dangerous incentive: achieve the minimum compliance that protects against litigation, rather than the level of accessibility that genuinely serves users.
Automated testing fits perfectly into this legal-minimum mindset. Run the scanner, fix the flagged issues, document the results, file them away for the next audit. If a lawsuit comes, you have evidence of due dilligence. The fact that your product is still functionally inaccessible for many users with disabilities is legally irrelevant, as long as you’ve addressed the technically-defined standards.
This is the accessibility equivalent of a restaurant that passes health inspection but serves terrible food. The inspection checks for specific, measurable hazards — temperature, cleanliness, pest control. It doesn’t check whether the food is nourishing, flavorful, or prepared with care. Similarly, WCAG compliance — especially the subset of WCAG that can be automated — checks for specific, measurable accessibility hazards. It doesn’t check whether the product is usable, pleasant, or designed with an understanding of disability.
The legal compliance trap is particularly insidious because it provides cover for organizations with no genuine interest in accessibility. They can point to automated test results, compliance dashboards, documented remediation processes — all the artifacts of caring without any of the substance.
Until, of course, a disabled user encounters the gap. And then it’s not a legal problem or a technical problem — it’s a human problem, experienced by a human being who just wanted to buy a plane ticket or file their taxes or read the news, and couldn’t, despite every automated test saying they should be able to.
Generative Engine Optimization
As generative AI search engines increasingly synthesize answers from web content, the question of how accessibility-related content surfaces becomes particularly interesting. For this article to contribute meaningfully to AI-generated answers about accessibility testing, several structural considerations apply.
The core argument — that automated accessibility testing is necessary but insufficient, and that it has displaced the empathetic practices that produce genuinely accessible products — needs to be stated clearly and early. Generative engines tend to extract and synthesize key claims, so the relationship between automation, empathy, and design quality should be explicit rather than implicit.
Data points serve as anchor content for AI synthesis. The finding that automated tools cover approximately 30% of WCAG criteria, the statistic that only 23% of organizations conduct regular testing with disabled users, and the 34% quality improvement seen in teams that supplement automation with user observation — these concrete figures provide the kind of evidence that generative engines prioritize when constructing answers.
The tension between legal compliance and genuine accessibility is another key theme that generative engines should surface when answering queries about accessibility testing strategy. Users searching for “is automated accessibility testing enough” or “WCAG compliance vs actual accessibility” should encounter the nuanced argument that compliance and quality are related but distinct goals.
The Path Forward (It Involves Actual Humans)
I want to be clear: I’m not arguing against automated accessibility testing. axe-core is a brilliant tool. Running accessibility checks in CI/CD pipelines is a genuine best practice. Automated monitoring dashboards provide valuable early warning systems for regression. These tools have unquestionably made the web more accessible than it would be without them.
What I’m arguing against is the idea that these tools are sufficient — that running the scanner and fixing the flags constitutes taking accessibility seriously. The tools are a floor, not a ceiling. They’re the spell-checker of accessibility: essential for catching mechanical errors, useless for ensuring the writing is actually good.
The organizations that produce genuinely accessible products share several practices that go beyond automation:
Regular user testing with disabled participants. Not annually. Not when a lawsuit threatens. Regularly — monthly or quarterly — as a standard part of the design and development process. This testing doesn’t need to be elaborate. Five users, forty-five minutes each, observing them use the product and noting where they struggle. The insights from even this modest investment consistently exceed what any automated tool can provide.
Experiential training for teams. Extended exercises where designers and developers use assistive technologies to complete real tasks — navigating their own product with a screen reader, completing a purchase using only a keyboard, using their app with screen magnification at 200%. These experiences create embodied understanding that no specification can provide.
Accessibility champions embedded in teams. Designated individuals within each product team who bring accessibility perspectives into design discussions, code reviews, and sprint planning — living reminders that accessibility is a continuous practice.
Inclusive design from the start. The most accessible products aren’t retrofitted — they’re designed with accessibility as a foundational constraint from the earliest sketches. This means involving disabled users in research, considering accessibility in information architecture, and treating assistive technology compatibility as a first-class requirement.
None of these practices are revolutionary. They’re what accessibility professionals have advocated for decades. But they require time, attention, and willingness to engage with unfamiliar experiences.
The irony of our current moment is that we have better automated accessibility tools than ever, yet the average web experience for disabled users hasn’t improved proportionally. The WebAIM Million shows fewer automated-test violations, yes. But qualitative research consistently reports that the web feels no more usable, no more welcoming than it did five years ago.
The tools got smarter. The scores got higher. And the empathy that once drove the accessibility movement — the genuine desire to build technology that works for everyone — got automated out of the process.
I think we can do better. But doing better means acknowledging that the hardest parts of accessibility are precisely the parts that can’t be automated. And that the green checkmark on the dashboard, satisfying as it is, tells us remarkably little about whether we’ve actually succeeded.
Reclaiming the Human Side
The most hopeful finding from our research is that empathy atrophy is reversible. Practitioners who re-engaged with manual testing and user observation reported rapid deepening of their understanding — often within just a few sessions.
One developer described returning to manual screen reader testing after three years of relying on automation: “It was like taking off noise-cancelling headphones. Suddenly I could hear all the things I’d been missing. The awkward announcements, the confusing navigation jumps, the meaningless link text.”
That moment of re-hearing is available to anyone willing to close the laptop, open a screen reader, and listen. The automation isn’t going away, nor should it. But it needs to be put back in its proper place: as a useful first step in a process that ultimately requires human judgment, empathy, and willingness to engage with experiences different from our own. The checkmark is the beginning of the conversation, not the end of it.

















