NaviSense: A Multimodal Assistive Mobile application for Object Retrieval by Persons with Visual Impairment

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Visually impaired individuals face a fundamental trade-off between precise spatial guidance and open-world object recognition in unstructured environments: existing solutions either rely on pre-scanned scenes or restricted object categories, or lack fine-grained spatial feedback. This paper introduces the first end-to-end mobile system that jointly enables open-vocabulary object recognition and real-time spatial navigation. It leverages vision-language large models to interpret natural-language target descriptions, integrates LiDAR-based depth sensing and AR-enabled spatial mapping, and delivers multimodal (audio-tactile) feedback for centimeter-accurate guidance—without requiring prior environment setup or object category constraints. Its key innovation lies in unifying open-world perception with high-precision spatial reasoning within a lightweight mobile architecture. In user studies with 12 blind and low-vision participants, the system significantly reduced object retrieval time and achieved higher user preference than state-of-the-art baselines, demonstrating both efficacy and practical viability.

Technology Category

Application Category

📝 Abstract
People with visual impairments often face significant challenges in locating and retrieving objects in their surroundings. Existing assistive technologies present a trade-off: systems that offer precise guidance typically require pre-scanning or support only fixed object categories, while those with open-world object recognition lack spatial feedback for reaching the object. To address this gap, we introduce 'NaviSense', a mobile assistive system that combines conversational AI, vision-language models, augmented reality (AR), and LiDAR to support open-world object detection with real-time audio-haptic guidance. Users specify objects via natural language and receive continuous spatial feedback to navigate toward the target without needing prior setup. Designed with insights from a formative study and evaluated with 12 blind and low-vision participants, NaviSense significantly reduced object retrieval time and was preferred over existing tools, demonstrating the value of integrating open-world perception with precise, accessible guidance.
Problem

Research questions and friction points this paper is trying to address.

Enabling visually impaired individuals to locate objects via natural language commands
Providing real-time spatial feedback for object retrieval without prior setup
Combining open-world object detection with precise audio-haptic guidance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines conversational AI with vision-language models
Integrates LiDAR and AR for spatial feedback
Provides real-time audio-haptic guidance for navigation
🔎 Similar Papers
No similar papers found.