Joint 2D-3D Segmentation and Association in Street-level Imaging

📅 2026-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the disconnection between 2D semantic segmentation and 3D geometry in street-view imagery, as well as the instability of cross-view object association. To this end, the authors propose a unified framework that integrates zero-shot 2D segmentation with Structure-from-Motion (SfM) reconstruction, introducing a novel 3D geometric consistency–driven mechanism for associating objects across views. By abandoning conventional 2D multi-object tracking, the method achieves more robust identity preservation under wide-baseline configurations and challenging imaging conditions, while supporting scalable handling of multiple object categories. Experiments on complex urban scenes demonstrate a significant improvement over state-of-the-art 2D tracking approaches, with a 22% gain in identity retention robustness and notably higher ground-truth sequence coverage.
📝 Abstract
Accurate interpretation of street-level imagery is essential for large-scale urban mapping and the creation of Spatial Digital Twin (SDT) environments. This work presents a unified framework for joint 2D-3D segmentation and association that integrates visual semantics with multi-view geometric reasoning. Unlike conventional approaches that rely heavily on sequential frames for temporal tracking, our method leverages zero-shot detection and segmentation together with structure-from-motion reconstruction to establish stable cross-view correspondences. A 3D-driven association mechanism replaces traditional 2D multi-object tracking, using geometric consistency to guide identity preservation across wide-baseline viewpoints and varying imaging conditions. By combining 2D texture cues with global 3D context, the proposed pipeline is well-suited for scalable street-level processing and can be used for a variety of object types. Experiments demonstrate substantially improved coverage of ground-truth sequences and more robust identity retention compared to state-of-the-art 2D-only tracking methods, achieving a 22% performance gain in challenging urban scenarios.
Problem

Research questions and friction points this paper is trying to address.

2D-3D segmentation
cross-view association
street-level imaging
Spatial Digital Twin
object identity preservation
Innovation

Methods, ideas, or system contributions that make the work stand out.

joint 2D-3D segmentation
zero-shot detection
structure-from-motion
3D-driven association
geometric consistency
🔎 Similar Papers
No similar papers found.