SweeperBot: Making 3D Browsing Accessible through View Analysis and Visual Question Answering

📅 2025-11-18

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Blind and low-vision users face significant challenges in effectively exploring and comparing 3D models due to the lack of accessible, interactive, and semantically rich interfaces. Method: This paper introduces SweeperBot—a novel assistive framework that deeply integrates visual question answering (VQA) into screen reader environments. It combines an optimal viewpoint selection algorithm, generative large models (e.g., 3D-to-text), and recognition-oriented multimodal foundation models to generate fine-grained, intent-driven, and context-aware natural-language descriptions of 3D content in response to user queries. Contribution/Results: Unlike static alt-text, SweeperBot enables active interaction, contextual understanding, and cross-model comparison. Expert evaluation shows that ten blind/low-vision users independently completed complex 3D exploration tasks; blind evaluations by thirty sighted users confirmed high accuracy and credibility of generated descriptions. This work establishes the first VQA paradigm for accessible 3D interaction, substantially advancing the accessibility and usability of 3D content.

Technology Category

Application Category

📝 Abstract

Accessing 3D models remains challenging for Screen Reader (SR) users. While some existing 3D viewers allow creators to provide alternative text, they often lack sufficient detail about the 3D models. Grounded on a formative study, this paper introduces SweeperBot, a system that enables SR users to leverage visual question answering to explore and compare 3D models. SweeperBot answers SR users' visual questions by combining an optimal view selection technique with the strength of generative- and recognition-based foundation models. An expert review with 10 Blind and Low-Vision (BLV) users with SR experience demonstrated the feasibility of using SweeperBot to assist BLV users in exploring and comparing 3D models. The quality of the descriptions generated by SweeperBot was validated by a second survey study with 30 sighted participants.

Problem

Research questions and friction points this paper is trying to address.

Making 3D browsing accessible for screen reader users

Addressing insufficient detail in existing 3D model descriptions

Enabling visual question answering for 3D model exploration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses visual question answering for 3D exploration

Combines optimal view selection with foundation models

Generates descriptions through recognition and generative models

🔎 Similar Papers

Open-set 3D semantic instance maps for vision language navigation – O3D-SIM