🤖 AI Summary
This work addresses the challenges of inconsistent and non-robust 3D reconstruction in multi-view or multi-instance settings, particularly when explicit alignment is unavailable due to repetitive structures or non-rigid deformations. To overcome these limitations, the authors propose a Joint Reconstruction Model (JRM) that treats multi-observation reconstruction as a personalized generative task. By leveraging 3D flow matching in a latent space, JRM implicitly aggregates unaligned observations without requiring explicit correspondence or rigid alignment, thereby capturing both shared object characteristics and individual pose or deformation states. Notably, JRM achieves the first implicit aggregation capable of handling non-rigid variations, significantly improving reconstruction consistency and fidelity. Experiments on both synthetic and real-world datasets demonstrate its superiority over independent reconstruction and alignment-based baselines, effectively mitigating errors from incorrect associations and deformation-induced ambiguities.
📝 Abstract
Object-centric reconstruction seeks to recover the 3D structure of a scene through composition of independent objects. While this independence can simplify modeling, it discards strong signals that could improve reconstruction, notably repetition where the same object model is seen multiple times in a scene, or across scans. We propose the Joint Reconstruction Model (JRM) to leverage repetition by framing object reconstruction as one of personalized generation: multiple observations share a common subject that should be consistent for all observations, while still adhering to the specific pose and state from each. Prior methods in this direction rely on explicit matching and rigid alignment across observations, making them sensitive to errors and difficult to extend to non-rigid transformations. In contrast, JRM is a 3D flow-matching generative model that implicitly aggregates unaligned observations in its latent space, learning to produce consistent and faithful reconstructions in a data-driven manner without explicit constraints. Evaluations on synthetic and real-world data show that JRM's implicit aggregation removes the need for explicit alignment, improves robustness to incorrect associations, and naturally handles non-rigid changes such as articulation. Overall, JRM outperforms both independent and alignment-based baselines in reconstruction quality.