E-M3RF: An Equivariant Multimodal 3D Re-assembly Framework

📅 2025-11-26

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing 3D fragment reassembly methods rely solely on geometric features, exhibiting poor robustness for small, eroded, or symmetric fragments and suffering from interpenetration due to the absence of physical constraints. This paper proposes the first equivariant multimodal SE(3) flow-matching framework for fragment reassembly. It integrates a rotation-equivariant geometric encoder with a color-aware Transformer to jointly model geometric and chromatic modalities. By explicitly learning rigid-body motion trajectories via SE(3) flow matching, the method inherently avoids interpenetration and enhances registration accuracy for symmetric and incomplete fragments. Evaluated on the RePAIR dataset, our approach reduces rotational and translational errors by 23.1% and 13.2%, respectively, and decreases Chamfer distance by 18.4%, significantly improving both reconstruction accuracy and physical plausibility for real-world cultural heritage artifacts.

Technology Category

Application Category

📝 Abstract

3D reassembly is a fundamental geometric problem, and in recent years it has increasingly been challenged by deep learning methods rather than classical optimization. While learning approaches have shown promising results, most still rely primarily on geometric features to assemble a whole from its parts. As a result, methods struggle when geometry alone is insufficient or ambiguous, for example, for small, eroded, or symmetric fragments. Additionally, solutions do not impose physical constraints that explicitly prevent overlapping assemblies. To address these limitations, we introduce E-M3RF, an equivariant multimodal 3D reassembly framework that takes as input the point clouds, containing both point positions and colors of fractured fragments, and predicts the transformations required to reassemble them using SE(3) flow matching. Each fragment is represented by both geometric and color features: i) 3D point positions are encoded as rotationconsistent geometric features using a rotation-equivariant encoder, ii) the colors at each 3D point are encoded with a transformer. The two feature sets are then combined to form a multimodal representation. We experimented on four datasets: two synthetic datasets, Breaking Bad and Fantastic Breaks, and two real-world cultural heritage datasets, RePAIR and Presious, demonstrating that E-M3RF on the RePAIR dataset reduces rotation error by 23.1% and translation error by 13.2%, while Chamfer Distance decreases by 18.4% compared to competing methods.

Problem

Research questions and friction points this paper is trying to address.

Addresses ambiguous 3D reassembly when geometry alone is insufficient

Prevents physically overlapping assemblies using multimodal fragment representations

Handles eroded or symmetric fragments by combining geometric and color features

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses rotation-equivariant encoder for geometric features

Combines geometric and color features with transformer

Employs SE(3) flow matching for transformation prediction

🔎 Similar Papers

Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View