SAM 3D for 3D Object Reconstruction from Remote Sensing Images

📅 2025-12-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Monocular remote sensing image-based 3D building reconstruction is hindered by reliance on task-specific architectures and dense, labor-intensive supervision. Method: This paper presents the first systematic evaluation and adaptation of the general-purpose image-to-3D foundation model SAM 3D to remote sensing. We propose a “segment–reconstruct–compose” pipeline for structured, city-scale 3D modeling and introduce the first SAM extension tailored for remote sensing 3D reconstruction. To objectively assess geometric fidelity, we propose CLIP-based Multi-Modal Distance (CMMD), a novel metric quantifying reconstruction quality. Contribution/Results: Evaluated on the NYC Urban Dataset, our approach significantly outperforms TRELLIS, yielding more coherent roof geometries with sharper boundaries. Both FID and CMMD scores show substantial improvement. This work breaks the dependency on task-specific designs and empirically validates the feasibility and effectiveness of leveraging general-purpose foundation models for large-scale urban 3D scene reconstruction.

Technology Category

Application Category

📝 Abstract
Monocular 3D building reconstruction from remote sensing imagery is essential for scalable urban modeling, yet existing methods often require task-specific architectures and intensive supervision. This paper presents the first systematic evaluation of SAM 3D, a general-purpose image-to-3D foundation model, for monocular remote sensing building reconstruction. We benchmark SAM 3D against TRELLIS on samples from the NYC Urban Dataset, employing Frechet Inception Distance (FID) and CLIP-based Maximum Mean Discrepancy (CMMD) as evaluation metrics. Experimental results demonstrate that SAM 3D produces more coherent roof geometry and sharper boundaries compared to TRELLIS. We further extend SAM 3D to urban scene reconstruction through a segment-reconstruct-compose pipeline, demonstrating its potential for urban scene modeling. We also analyze practical limitations and discuss future research directions. These findings provide practical guidance for deploying foundation models in urban 3D reconstruction and motivate future integration of scene-level structural priors.
Problem

Research questions and friction points this paper is trying to address.

Evaluates SAM 3D for monocular 3D building reconstruction from remote sensing images
Compares SAM 3D with TRELLIS using FID and CMMD metrics on urban data
Extends SAM 3D to urban scene reconstruction via a segment-reconstruct-compose pipeline
Innovation

Methods, ideas, or system contributions that make the work stand out.

SAM 3D general-purpose foundation model for monocular reconstruction
Segment-reconstruct-compose pipeline for urban scene modeling
Benchmarked using FID and CMMD metrics against TRELLIS