RigidFormer: Learning Rigid Dynamics using Transformers

📅 2026-05-09

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Existing learning-based approaches to multi-rigid-body dynamics simulation are hindered by contact discontinuities, error accumulation, reliance on mesh connectivity, and difficulty handling unstructured inputs such as point clouds. This work proposes an object-centric Transformer model that advances object states through compact anchors and incorporates Anchor-Vertex Pooling to integrate local geometric information. The method employs anchor-based Rotary Position Embedding (RoPE) to achieve equivariant and invariant modeling with respect to the unordered nature of objects and anchors, and leverages differentiable Kabsch alignment to project state updates onto the rigid-body manifold. This framework is the first to enable efficient and scalable rigid-body dynamics learning directly from raw point clouds, matching or surpassing mesh-based methods on standard benchmarks while supporting cross-resolution and cross-dataset generalization, faster inference, scalability to over 200 objects, and preliminary extension to instruction-conditioned articulated-body simulation.

📝 Abstract

Learning-based simulation of multi-object rigid-body dynamics remains difficult because contact is discontinuous and errors compound over long horizons. Most existing methods remain tied to mesh connectivity and vertex-level message passing, which limits their applicability to mesh-free inputs such as point clouds and leads to high computational cost. Efficiently modeling high-fidelity rigid-body dynamics from mesh-free representations, therefore, remains challenging. We introduce RigidFormer, an object-centric Transformer-based model that learns mesh-free rigid-body dynamics with controllable integration step sizes. RigidFormer reasons at the object level and advances each object through compact anchors; Anchor-Vertex Pooling enriches these anchors with local vertex features, retaining contact-relevant geometry without dense vertex-level interaction. We propose Anchor-based RoPE to inject anchor geometry into attention while respecting the unordered nature of objects and anchors: object-token processing is permutation-equivariant, and the mean-pooled anchor descriptor is invariant to anchor reindexing while preserving shape extent. RigidFormer further enforces rigidity by projecting updates onto the rigid-body manifold using differentiable Kabsch alignment. On standard benchmarks, RigidFormer outperforms or matches mesh-based baselines using point inputs, runs faster, generalizes to unseen point resolutions and across datasets, and scales to 200+ objects; we also show a preliminary extension to command-conditioned articulated bodies by treating body parts as interacting object-level components.

Problem

Research questions and friction points this paper is trying to address.

rigid-body dynamics

mesh-free representation

point clouds

multi-object simulation

contact discontinuity

Innovation

Methods, ideas, or system contributions that make the work stand out.

RigidFormer

mesh-free dynamics

object-centric Transformer