Joint Instance Segmentation and Geometric Attribute Regression for Roof Structures in Aerial Imagery

📅 2026-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a unified framework based on an extended Mask R-CNN to jointly parse single aerial orthophotos, simultaneously achieving roof instance segmentation and precise regression of three continuous geometric attributes: building height, roof slope, and orientation. The method introduces a dedicated geometric regression branch and a conditional orientation loss to mitigate noise from flat roofs, and employs a logarithmic-normalized height representation to address the skewed distribution of building heights. Using a ConvNeXt-Base backbone pretrained with DINOv3, the model demonstrates strong performance on a large-scale Dutch dataset, yielding mean absolute errors of approximately 4° for roof slope, 7° for orientation, and 1 meter for height, along with an instance segmentation AP₅₀ of 0.566, thereby enabling efficient generation of LoD2-level 3D building models.
📝 Abstract
We present a method for jointly predicting instance-level roof segment masks together with three continuous geometric attributes -- building height, roof slope, and roof azimuth -- from a single aerial orthophoto. Our approach extends Mask R-CNN with a dedicated attribute regression branch and introduces two key innovations: a conditional azimuth loss that suppresses supervision for flat roof segments where azimuth labels are inherently noisy, and a log-normalized height representation that addresses the heavily skewed distribution of building heights. We train and evaluate on a large-scale dataset of Dutch aerial images paired with automatically derived ground truth from 3DBAG, a nationwide LiDAR-based 3D building dataset. Using a DINOv3 ConvNeXt-Base backbone, our method achieves a mean absolute error of approximately 4 degrees for roof slope, 7 degrees for azimuth, and 1 meter for building height, with an instance segmentation AP$_{50}$ of 0.566. The predicted per-segment masks and attributes are sufficient to reconstruct simplified 3D building models (LoD2) from a single overhead image, requiring expensive 3D reference data only for training.
Problem

Research questions and friction points this paper is trying to address.

instance segmentation
geometric attribute regression
roof structures
aerial imagery
3D building modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

instance segmentation
geometric attribute regression
conditional azimuth loss
log-normalized height representation
3D building reconstruction
🔎 Similar Papers
2023-10-25IEEE Transactions on Geoscience and Remote SensingCitations: 1