๐ค AI Summary
Professional audio description (AD) for video content is costly and difficult to scale. Method: This paper introduces a viewer-participatory paradigm for video accessibility, leveraging lightweight in-situ prompting to elicit grounded visual commentary from ordinary viewers during YouTube playback. Grounded in the Fogg Behavior Model, we implement a Chrome extension that integrates accessibility gap detection, context-aware prompt dialogs, fuzzy-comment warnings, and reference aids. Contribution/Results: In a 48-participant user study, 89% of generated comments provided accurate, contextually appropriate visual descriptions. Follow-up interviews confirmed their efficacy in complementing professional ADโparticularly for conveying critical visual context and affective details. To our knowledge, this is the first systematic integration of crowdsourced descriptive commentary into video accessibility practice, offering a viable, low-cost pathway toward scalable, inclusive audiovisual content.
๐ Abstract
The rapid growth of online video content has outpaced efforts to make visual information accessible to blind and low vision (BLV) audiences. While professional Audio Description (AD) remains the gold standard, it is costly and difficult to scale across the vast volume of online media. In this work, we explore a complementary approach to broaden participation in video accessibility: engaging everyday video viewers at their watching and commenting time. We introduce CoSight, a Chrome extension that augments YouTube with lightweight, in-situ nudges to support descriptive commenting. Drawing from Fogg's Behavior Model, CoSight provides visual indicators of accessibility gaps, pop-up hints for what to describe, reminders to clarify vague comments, and related captions and comments as references. In an exploratory study with 48 sighted users, CoSight helped integrate accessibility contribution into natural viewing and commenting practices, resulting in 89% of comments including grounded visual descriptions. Follow-up interviews with four BLV viewers and four professional AD writers suggest that while such comments do not match the rigor of professional AD, they can offer complementary value by conveying visual context and emotional nuance for understanding the videos.