ClothTransformer: Unified Latent-Space Transformers for Scalable Cloth Simulation

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses key limitations of traditional neural cloth simulation methods—namely their restriction to single scenarios, dependence on mesh discretization, and inadequate collision handling—by introducing a Transformer-based autoregressive sequence modeling framework. The proposed approach maps cloth dynamics into a scalable, mesh-resolution-agnostic latent space, enabling unified simulation across diverse scenarios. Central contributions include the construction of a high-quality, penetration-free dataset comprising 493.4k frames, integration of a differentiable continuous collision detection module, and substantial improvements in accuracy and generalization. Evaluated on three challenging tasks—body-driven garment animation, robotic manipulation, and free-fall collisions—the method reduces simulation error by 4–9× compared to prior approaches.

📝 Abstract

Unified and scalable Transformers have recently achieved remarkable success in modeling diverse phenomena traditionally associated with computer graphics, such as 3D visual effects, rendering processes, and motion in videos. In this work, we take a step further by investigating whether modern Transformer techniques can tackle the challenging task of cloth simulation. To this end, we present ClothTransformer, a framework that reformulates cloth simulation as autoregressive sequence modeling in a learned latent space. Existing neural cloth simulators are largely specialized to single scenarios, intrinsically coupled to the mesh discretization, and lack robust collision handling. Our approach addresses these limitations through three contributions: (1) a unified Transformer architecture that handles diverse scenarios -- body-driven garments, robotic manipulation, and free-fall collisions -- under a single model and achieves approximately $4$--$9{\times}$ lower error than prior state-of-the-art methods across all scenarios; (2) a scalable latent-space formulation that compresses arbitrary-resolution meshes into a fixed-size set of latent tokens, making temporal dynamics computation independent of mesh resolution; and (3) a diverse-scenario high-fidelity penetration-free dataset of ${\sim}$493.4k frames spanning all three settings, which enables a differentiable Continuous Collision Detection (CCD) module to suppress penetration artifacts.

Problem

Research questions and friction points this paper is trying to address.

cloth simulation

neural simulators

collision handling

mesh discretization

scalability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cloth simulation

Latent-space Transformer

Autoregressive modeling