Floorplan2Guide: LLM-Guided Floorplan Parsing for BLV Indoor Navigation

📅 2025-12-12

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Existing indoor navigation systems for visually impaired individuals rely heavily on fixed infrastructure (e.g., beacons) and exhibit poor adaptability to dynamic environments. Method: This paper introduces the first LLM-driven end-to-end floorplan parsing framework that automatically converts architectural floorplans into structured knowledge graphs and generates natural-language navigation instructions—eliminating dependence on physical infrastructure. The approach integrates multimodal foundation models, Claude 3.7 Sonnet, few-shot prompting, and graph-structured spatial modeling, replacing handcrafted rules and pixel-level visual reasoning with knowledge-graph-based representation. Contribution/Results: Evaluated on the MP-1 dataset under a 5-shot setting, the framework achieves navigation accuracy of 92.3%, 76.9%, and 61.5% for short, medium, and long paths, respectively. Graph-structured modeling improves navigation success rate by 15.4% over direct visual reasoning, significantly enhancing robustness in dynamic environments and cross-scenario generalization.

Technology Category

Application Category

📝 Abstract

Indoor navigation remains a critical challenge for people with visual impairments. The current solutions mainly rely on infrastructure-based systems, which limit their ability to navigate safely in dynamic environments. We propose a novel navigation approach that utilizes a foundation model to transform floor plans into navigable knowledge graphs and generate human-readable navigation instructions. Floorplan2Guide integrates a large language model (LLM) to extract spatial information from architectural layouts, reducing the manual preprocessing required by earlier floorplan parsing methods. Experimental results indicate that few-shot learning improves navigation accuracy in comparison to zero-shot learning on simulated and real-world evaluations. Claude 3.7 Sonnet achieves the highest accuracy among the evaluated models, with 92.31%, 76.92%, and 61.54% on the short, medium, and long routes, respectively, under 5-shot prompting of the MP-1 floor plan. The success rate of graph-based spatial structure is 15.4% higher than that of direct visual reasoning among all models, which confirms that graphical representation and in-context learning enhance navigation performance and make our solution more precise for indoor navigation of Blind and Low Vision (BLV) users.

Problem

Research questions and friction points this paper is trying to address.

Develops a floorplan-to-graph system for BLV indoor navigation

Reduces manual preprocessing in floorplan parsing using LLMs

Enhances navigation accuracy via graphical representation and in-context learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM extracts spatial data from floorplans

Converts floorplans into navigable knowledge graphs

Uses few-shot learning to boost navigation accuracy

🔎 Similar Papers

Vision Language Models Can Parse Floor Plan Maps