🤖 AI Summary
To address the accessibility challenges faced by blind and low-vision (BLV) users in comprehending, locating, and engaging with video danmaku (real-time overlaid comments), this paper introduces the first multi-voice audio discussion paradigm for danmaku vocalization. Our approach comprises three core technical components: context-enhanced semantic parsing, non-intrusive audio–video fusion, and socially informed multi-track audio organization. The system integrates text-to-speech synthesis, time-synchronized summarization, spatial audio rendering, and preference-driven scheduling to enable real-time, spatialized, and personalized danmaku vocalization. Evaluated with 12 BLV participants, the system improved danmaku comprehension accuracy by 68%, achieved a viewing fluency rating of 4.6/5, and enabled 92% of users to perceive significantly enhanced co-presence and community belongingness—thereby, for the first time, faithfully reproducing the social interaction essence of danmaku at the auditory level.
📝 Abstract
By overlaying time-synced user comments on videos, Danmu creates a co-watching experience for online viewers. However, its visual-centric design poses significant challenges for blind and low vision (BLV) viewers. Our formative study identified three primary challenges that hinder BLV viewers' engagement with Danmu: the lack of visual context, the speech interference between comments and videos, and the disorganization of comments. To address these challenges, we present DanmuA11y, a system that makes Danmu accessible by transforming it into multi-viewer audio discussions. DanmuA11y incorporates three core features: (1) Augmenting Danmu with visual context, (2) Seamlessly integrating Danmu into videos, and (3) Presenting Danmu via multi-viewer discussions. Evaluation with twelve BLV viewers demonstrated that DanmuA11y significantly improved Danmu comprehension, provided smooth viewing experiences, and fostered social connections among viewers. We further highlight implications for enhancing commentary accessibility in video-based social media and live-streaming platforms.