NaviSense: A Multimodal Assistive Mobile application for Object Retrieval by Persons with Visual Impairment

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

Visually impaired individuals face a fundamental trade-off between precise spatial guidance and open-world object recognition in unstructured environments: existing solutions either rely on pre-scanned scenes or restricted object categories, or lack fine-grained spatial feedback. This paper introduces the first end-to-end mobile system that jointly enables open-vocabulary object recognition and real-time spatial navigation. It leverages vision-language large models to interpret natural-language target descriptions, integrates LiDAR-based depth sensing and AR-enabled spatial mapping, and delivers multimodal (audio-tactile) feedback for centimeter-accurate guidance—without requiring prior environment setup or object category constraints. Its key innovation lies in unifying open-world perception with high-precision spatial reasoning within a lightweight mobile architecture. In user studies with 12 blind and low-vision participants, the system significantly reduced object retrieval time and achieved higher user preference than state-of-the-art baselines, demonstrating both efficacy and practical viability.

Technology Category

Application Category

📝 Abstract

People with visual impairments often face significant challenges in locating and retrieving objects in their surroundings. Existing assistive technologies present a trade-off: systems that offer precise guidance typically require pre-scanning or support only fixed object categories, while those with open-world object recognition lack spatial feedback for reaching the object. To address this gap, we introduce 'NaviSense', a mobile assistive system that combines conversational AI, vision-language models, augmented reality (AR), and LiDAR to support open-world object detection with real-time audio-haptic guidance. Users specify objects via natural language and receive continuous spatial feedback to navigate toward the target without needing prior setup. Designed with insights from a formative study and evaluated with 12 blind and low-vision participants, NaviSense significantly reduced object retrieval time and was preferred over existing tools, demonstrating the value of integrating open-world perception with precise, accessible guidance.

Problem

Research questions and friction points this paper is trying to address.

Enabling visually impaired individuals to locate objects via natural language commands

Providing real-time spatial feedback for object retrieval without prior setup

Combining open-world object detection with precise audio-haptic guidance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines conversational AI with vision-language models

Integrates LiDAR and AR for spatial feedback

Provides real-time audio-haptic guidance for navigation

🔎 Similar Papers

No similar papers found.

Bosch Group

$39.00 - $64.00

Sunnyvale, California / Pittsburgh, Pennsylvania / Cambridge, Massachusetts

Research Scientist Intern, Multimodal Contextual AI (PhD)