ART: Articulated Reconstruction Transformer

📅 2025-12-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenging problem of reconstructing complete, articulated 3D objects directly from sparse multi-view RGB images—without iterative optimization or category-specific priors. The proposed method introduces the first part-slotting Transformer architecture, which explicitly models an object as a composition of rigid parts connected by physically interpretable joints. It end-to-end predicts part geometry, texture, and explicit articulation parameters. Key innovations include a learnable part-slot encoder, a multi-task decoder, part-level supervision, and joint optimization of sparse image feature mapping and joint parameter regression. Evaluated on multiple benchmarks, the approach significantly surpasses state-of-the-art methods, achieving—for the first time—high-fidelity, category-agnostic, exportable, and physics-simulatable articulated 3D reconstruction directly driven by input images.

Technology Category

Application Category

📝 Abstract
We introduce ART, Articulated Reconstruction Transformer -- a category-agnostic, feed-forward model that reconstructs complete 3D articulated objects from only sparse, multi-state RGB images. Previous methods for articulated object reconstruction either rely on slow optimization with fragile cross-state correspondences or use feed-forward models limited to specific object categories. In contrast, ART treats articulated objects as assemblies of rigid parts, formulating reconstruction as part-based prediction. Our newly designed transformer architecture maps sparse image inputs to a set of learnable part slots, from which ART jointly decodes unified representations for individual parts, including their 3D geometry, texture, and explicit articulation parameters. The resulting reconstructions are physically interpretable and readily exportable for simulation. Trained on a large-scale, diverse dataset with per-part supervision, and evaluated across diverse benchmarks, ART achieves significant improvements over existing baselines and establishes a new state of the art for articulated object reconstruction from image inputs.
Problem

Research questions and friction points this paper is trying to address.

Reconstructs 3D articulated objects from sparse RGB images
Predicts part geometry, texture, and articulation parameters jointly
Overcomes limitations of slow optimization and category-specific models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer architecture for part-based 3D reconstruction
Learns part slots from sparse multi-state RGB images
Predicts geometry, texture, and articulation parameters jointly
🔎 Similar Papers
No similar papers found.