Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation

📅 2025-04-09
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing benchmarks primarily focus on geometric fragments or industrial parts, failing to capture the complex functional and spatial constraints inherent in everyday object assembly (e.g., plugging a charger, arranging flowers, toasting bread). To address this gap, we introduce 2BY2—the first large-scale, fine-grained paired assembly dataset for daily 3D assembly—comprising 1,034 instances across 517 object pairs, with precise SE(3) pose and symmetry annotations. Leveraging 2BY2, we establish a multi-task paired assembly benchmark and propose a two-stage SE(3) pose estimation method: (i) equivariant feature fusion to jointly model geometry, functionality, and spatial constraints; and (ii) semantic-guided pose refinement. Our approach achieves state-of-the-art performance across all 18 tasks in 2BY2 and demonstrates strong generalization to unseen object combinations and reliable manipulation capability on real robotic hardware.

Technology Category

Application Category

📝 Abstract
3D assembly tasks, such as furniture assembly and component fitting, play a crucial role in daily life and represent essential capabilities for future home robots. Existing benchmarks and datasets predominantly focus on assembling geometric fragments or factory parts, which fall short in addressing the complexities of everyday object interactions and assemblies. To bridge this gap, we present 2BY2, a large-scale annotated dataset for daily pairwise objects assembly, covering 18 fine-grained tasks that reflect real-life scenarios, such as plugging into sockets, arranging flowers in vases, and inserting bread into toasters. 2BY2 dataset includes 1,034 instances and 517 pairwise objects with pose and symmetry annotations, requiring approaches that align geometric shapes while accounting for functional and spatial relationships between objects. Leveraging the 2BY2 dataset, we propose a two-step SE(3) pose estimation method with equivariant features for assembly constraints. Compared to previous shape assembly methods, our approach achieves state-of-the-art performance across all 18 tasks in the 2BY2 dataset. Additionally, robot experiments further validate the reliability and generalization ability of our method for complex 3D assembly tasks.
Problem

Research questions and friction points this paper is trying to address.

Addressing complexities in everyday object interactions and assemblies
Providing a dataset for daily pairwise objects assembly tasks
Developing a method for 3D assembly with functional and spatial constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale annotated dataset for daily pairwise objects assembly
Two-step SE(3) pose estimation method
Equivariant features for assembly constraints
🔎 Similar Papers
No similar papers found.