🤖 AI Summary
This study addresses the challenges of large-scale web accessibility remediation, where rule-based tools suffer from limited coverage and manual approaches incur high costs. It presents the first systematic evaluation of large language model (LLM) agents—exemplified by Kimi K2.5—in detecting and repairing accessibility issues, benchmarked against traditional rule-driven methods. Using F1 scores, structural validity checks, and API cost analysis, the work demonstrates that LLMs achieve strong performance in semantic understanding (F1=0.83) and initial repair attempts (F1=0.65), improving compliance on 80.2% of pages and reducing average violations from 3.98 to 1.7. However, only 26% of cases are fully resolved, and 30% of repairs introduce unintended structural changes. The findings highlight both the promise and limitations of LLMs, advocating for a hybrid remediation framework that integrates rule-based validation to ensure robustness and correctness.
📝 Abstract
Ensuring web accessibility at scale remains challenging because rule-based tools provide limited coverage while manual remediation is costly and error-prone. This paper evaluates large language model based agents, specifically Kimi K2.5, for automated accessibility detection and repair compared with rule-based approaches. For detection, the LLM achieves performance comparable to rule-based tools, with F1 around 0.65, strong semantic understanding with F1 of 0.83, but lower reliability for syntactic and layout-related violations. For remediation, LLM-generated fixes are syntactically valid in over 99.7 percent of cases and improve accessibility compliance in 80.2 percent of instances, reducing violations from 3.98 to 1.7 per file. However, fewer than 26 percent of cases are fully resolved, and about 30 percent of patches introduce structural changes. We also find that iterative agent-based refinement increases computational cost by 52 percent and API usage by 1.64 times without improving remediation outcomes. These findings indicate that while LLMs are effective for partial accessibility repair, they are insufficient for complete and reliable remediation. Scalable accessibility solutions require hybrid approaches that combine LLM capabilities with rule-based validation and constraint-aware correction mechanisms.