🤖 AI Summary
Existing indoor navigation systems for visually impaired individuals rely heavily on fixed infrastructure (e.g., beacons) and exhibit poor adaptability to dynamic environments.
Method: This paper introduces the first LLM-driven end-to-end floorplan parsing framework that automatically converts architectural floorplans into structured knowledge graphs and generates natural-language navigation instructions—eliminating dependence on physical infrastructure. The approach integrates multimodal foundation models, Claude 3.7 Sonnet, few-shot prompting, and graph-structured spatial modeling, replacing handcrafted rules and pixel-level visual reasoning with knowledge-graph-based representation.
Contribution/Results: Evaluated on the MP-1 dataset under a 5-shot setting, the framework achieves navigation accuracy of 92.3%, 76.9%, and 61.5% for short, medium, and long paths, respectively. Graph-structured modeling improves navigation success rate by 15.4% over direct visual reasoning, significantly enhancing robustness in dynamic environments and cross-scenario generalization.
📝 Abstract
Indoor navigation remains a critical challenge for people with visual impairments. The current solutions mainly rely on infrastructure-based systems, which limit their ability to navigate safely in dynamic environments. We propose a novel navigation approach that utilizes a foundation model to transform floor plans into navigable knowledge graphs and generate human-readable navigation instructions. Floorplan2Guide integrates a large language model (LLM) to extract spatial information from architectural layouts, reducing the manual preprocessing required by earlier floorplan parsing methods. Experimental results indicate that few-shot learning improves navigation accuracy in comparison to zero-shot learning on simulated and real-world evaluations. Claude 3.7 Sonnet achieves the highest accuracy among the evaluated models, with 92.31%, 76.92%, and 61.54% on the short, medium, and long routes, respectively, under 5-shot prompting of the MP-1 floor plan. The success rate of graph-based spatial structure is 15.4% higher than that of direct visual reasoning among all models, which confirms that graphical representation and in-context learning enhance navigation performance and make our solution more precise for indoor navigation of Blind and Low Vision (BLV) users.