Extracting polygonal footprints in off-nadir images with Segment Anything Model

📅 2024-08-16
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF

career value

184K/year
🤖 AI Summary
To address low accuracy and poor generalization in building footprint extraction from oblique remote sensing imagery, this paper proposes an end-to-end promptable framework for direct polygonal footprint prediction, abandoning the conventional segmentation-plus-postprocessing paradigm. Key contributions include: (1) a Self-Offset Attention (SOFA) mechanism that explicitly models geometric distortions under oblique viewing angles; (2) a Multi-level Information Fusion System (MISS) enabling scale-robust modeling—from single-story buildings to skyscrapers; and (3) a promptable learning and multi-source mask joint modeling framework built upon the SAM architecture. The method directly outputs high-fidelity vectorized building contours without postprocessing. Extensive experiments on BONAI, OmniCity-view3, and Huizhou datasets demonstrate substantial improvements over state-of-the-art methods, achieving superior accuracy, strong cross-scene generalization, and practical deployability.

Technology Category

Application Category

📝 Abstract
Building Footprint Extraction (BFE) from off-nadir aerial images often involves roof segmentation and offset prediction to adjust roof boundaries to the building footprint. However, this multi-stage approach typically produces low-quality results, limiting its applicability in real-world data production. To address this issue, we present OBMv2, an end-to-end and promptable model for polygonal footprint prediction. Unlike its predecessor OBM, OBMv2 introduces a novel Self Offset Attention (SOFA) mechanism that improves performance across diverse building types, from bungalows to skyscrapers, enabling end-to-end footprint prediction without post-processing. Additionally, we propose a Multi-level Information System (MISS) to effectively leverage roof masks, building masks, and offsets for accurate footprint prediction. We evaluate OBMv2 on the BONAI and OmniCity-view3 datasets and demonstrate its generalization on the Huizhou test set. The code will be available at https://github.com/likaiucas/OBMv2.
Problem

Research questions and friction points this paper is trying to address.

Extracting precise polygonal building footprints from off-nadir images
Overcoming geometric complexities in off-nadir viewing angles
Improving boundary accuracy without external post-processing steps
Innovation

Methods, ideas, or system contributions that make the work stand out.

Direct polygonal output without post-processing
High-Quality Mask Prompter for precise roofs
Self Offset Attention for accuracy improvement
🔎 Similar Papers
No similar papers found.
K
Kai Li
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, China; School of Electronic Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
J
Jingbo Chen
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, China
Yupeng Deng
Yupeng Deng
aircas
Y
Yu Meng
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, China
D
Diyou Liu
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, China
J
Junxian Ma
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, China; School of Electronic Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
Chenhao Wang
Chenhao Wang
Tencent
Natural Language ProcessingLarge Language Models