Perspective from a Broader Context: Can Room Style Knowledge Help Visual Floorplan Localization?

📅 2025-08-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Floor plan localization (FLoc) suffers from ambiguity due to repetitive structural elements—such as corridors and corners—in floor plans, which hinder reliable image-to-plan correspondence. Existing approaches predominantly rely on 2D structural matching or 3D geometric constraints for visual pretraining, overlooking the rich scene semantics embedded in images. To address this, we propose the first room-type prior–guided FLoc framework, introducing an unsupervised clustering–based room discriminator pretraining method. This method automatically discovers implicit room-style features from self-collected, unlabeled indoor imagery and integrates them into the FLoc pipeline to enhance contextual awareness. By leveraging semantic room-level cues, our approach effectively mitigates structural ambiguity and significantly improves localization confidence and accuracy. Evaluated on two standard FLoc benchmarks, our method achieves state-of-the-art performance in both accuracy and robustness, outperforming all existing SOTA methods.

Technology Category

Application Category

📝 Abstract
Since a building's floorplan remains consistent over time and is inherently robust to changes in visual appearance, visual Floorplan Localization (FLoc) has received increasing attention from researchers. However, as a compact and minimalist representation of the building's layout, floorplans contain many repetitive structures (e.g., hallways and corners), thus easily result in ambiguous localization. Existing methods either pin their hopes on matching 2D structural cues in floorplans or rely on 3D geometry-constrained visual pre-trainings, ignoring the richer contextual information provided by visual images. In this paper, we suggest using broader visual scene context to empower FLoc algorithms with scene layout priors to eliminate localization uncertainty. In particular, we propose an unsupervised learning technique with clustering constraints to pre-train a room discriminator on self-collected unlabeled room images. Such a discriminator can empirically extract the hidden room type of the observed image and distinguish it from other room types. By injecting the scene context information summarized by the discriminator into an FLoc algorithm, the room style knowledge is effectively exploited to guide definite visual FLoc. We conducted sufficient comparative studies on two standard visual Floc benchmarks. Our experiments show that our approach outperforms state-of-the-art methods and achieves significant improvements in robustness and accuracy.
Problem

Research questions and friction points this paper is trying to address.

Address ambiguous floorplan localization due to repetitive structures
Leverage visual scene context to reduce localization uncertainty
Enhance FLoc accuracy using unsupervised room type discrimination
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised learning with clustering constraints
Room discriminator pre-trained on unlabeled images
Injecting scene context into FLoc algorithm
🔎 Similar Papers
2024-03-05Computer Vision and Pattern RecognitionCitations: 4
B
Bolei Chen
School of Computer Science and Engineering, Central South University
S
Shengsheng Yan
School of Computer Science and Engineering, Central South University
Y
Yongzheng Cui
School of Computer Science and Engineering, Central South University
J
Jiaxu Kang
School of Computer Science and Engineering, Central South University
Ping Zhong
Ping Zhong
University of Houston
Jianxin Wang
Jianxin Wang
School of Computer Science and Engineering, Central South university
AlgorithmBioinformaticsComputer Network