π€ AI Summary
Existing motion modeling approaches rely on hand-crafted, fixed hierarchical structures, limiting generalizability and interpretability. This paper proposes a data-driven, differentiable graph learning framework that automatically infers hierarchical graph structures from motion sequences: nodes represent elementary motion units, and directed edges encode parentβchild dynamic dependencies. By decomposing absolute motion and performing hierarchical graph reasoning, the framework disentangles complex motion into inherited global patterns and local residual deformations. To our knowledge, this is the first method enabling end-to-end learning and interpretable inference of motion hierarchy without predefined motion primitives. Evaluated on 1D translational, 2D rotational, and 3D Gaussian lattice dynamics modeling tasks, our approach achieves significantly improved reconstruction accuracy and more physically plausible motion deformation. Moreover, it demonstrates superior cross-task generalization capability compared to prior methods.
π Abstract
Hierarchical structures of motion exist across research fields, including computer vision, graphics, and robotics, where complex dynamics typically arise from coordinated interactions among simpler motion components. Existing methods to model such dynamics typically rely on manually-defined or heuristic hierarchies with fixed motion primitives, limiting their generalizability across different tasks. In this work, we propose a general hierarchical motion modeling method that learns structured, interpretable motion relationships directly from data. Our method represents observed motions using graph-based hierarchies, explicitly decomposing global absolute motions into parent-inherited patterns and local motion residuals. We formulate hierarchy inference as a differentiable graph learning problem, where vertices represent elemental motions and directed edges capture learned parent-child dependencies through graph neural networks. We evaluate our hierarchical reconstruction approach on three examples: 1D translational motion, 2D rotational motion, and dynamic 3D scene deformation via Gaussian splatting. Experimental results show that our method reconstructs the intrinsic motion hierarchy in 1D and 2D cases, and produces more realistic and interpretable deformations compared to the baseline on dynamic 3D Gaussian splatting scenes. By providing an adaptable, data-driven hierarchical modeling paradigm, our method offers a formulation applicable to a broad range of motion-centric tasks. Project Page: https://light.princeton.edu/HEIR/