🤖 AI Summary
Multi-view multi-object association remains a critical challenge in 3D reconstruction—particularly in textureless, high-density scenes with low camera overlap—where appearance-based or epipolar-constraint-driven methods suffer from insufficient robustness. This paper proposes a training-free, geometry-driven framework: first, a δ-overlap graph is constructed to model spatial proximity among cross-view detection bounding boxes; second, outlier suppression and match verification are jointly performed via interquartile range (IQR) filtering and 3D back-projection error minimization; finally, epipolar-geometric consistency is used to weight graph edges, followed by δ-neighborhood clustering for robust instance association. Crucially, the method operates entirely without appearance features. It significantly outperforms existing geometry-based baselines under challenging conditions—including textureless surfaces and sensor noise—and demonstrates strong scalability for large-scale real-world 3D reconstruction.
📝 Abstract
Multi-view multi-object association is a fundamental step in 3D reconstruction pipelines, enabling consistent grouping of object instances across multiple camera views. Existing methods often rely on appearance features or geometric constraints such as epipolar consistency. However, these approaches can fail when objects are visually indistinguishable or observations are corrupted by noise. We propose C-DOG, a training-free framework that serves as an intermediate module bridging object detection (or pose estimation) and 3D reconstruction, without relying on visual features. It combines connected delta-overlap graph modeling with epipolar geometry to robustly associate detections across views. Each 2D observation is represented as a graph node, with edges weighted by epipolar consistency. A delta-neighbor-overlap clustering step identifies strongly consistent groups while tolerating noise and partial connectivity. To further improve robustness, we incorporate Interquartile Range (IQR)-based filtering and a 3D back-projection error criterion to eliminate inconsistent observations. Extensive experiments on synthetic benchmarks demonstrate that C-DOG outperforms geometry-based baselines and remains robust under challenging conditions, including high object density, without visual features, and limited camera overlap, making it well-suited for scalable 3D reconstruction in real-world scenarios.