π€ AI Summary
Blind and low-vision users face significant challenges in effectively exploring and comparing 3D models due to the lack of accessible, interactive, and semantically rich interfaces.
Method: This paper introduces SweeperBotβa novel assistive framework that deeply integrates visual question answering (VQA) into screen reader environments. It combines an optimal viewpoint selection algorithm, generative large models (e.g., 3D-to-text), and recognition-oriented multimodal foundation models to generate fine-grained, intent-driven, and context-aware natural-language descriptions of 3D content in response to user queries.
Contribution/Results: Unlike static alt-text, SweeperBot enables active interaction, contextual understanding, and cross-model comparison. Expert evaluation shows that ten blind/low-vision users independently completed complex 3D exploration tasks; blind evaluations by thirty sighted users confirmed high accuracy and credibility of generated descriptions. This work establishes the first VQA paradigm for accessible 3D interaction, substantially advancing the accessibility and usability of 3D content.
π Abstract
Accessing 3D models remains challenging for Screen Reader (SR) users. While some existing 3D viewers allow creators to provide alternative text, they often lack sufficient detail about the 3D models. Grounded on a formative study, this paper introduces SweeperBot, a system that enables SR users to leverage visual question answering to explore and compare 3D models. SweeperBot answers SR users' visual questions by combining an optimal view selection technique with the strength of generative- and recognition-based foundation models. An expert review with 10 Blind and Low-Vision (BLV) users with SR experience demonstrated the feasibility of using SweeperBot to assist BLV users in exploring and comparing 3D models. The quality of the descriptions generated by SweeperBot was validated by a second survey study with 30 sighted participants.