JODA: Composable Joint Dynamics for Articulated Objects

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Existing simulation and embodied intelligence systems struggle to model fine-grained dynamical effects of articulated objects, such as frictional holding, positional sticking, and damped closure. This work proposes a structured three-channel field representation that explicitly captures conservative forces, dry friction, and damping along joint degrees of freedom, enabling inference and composition of interpretable dynamical primitives from vision-language inputs. For the first time, joint dynamics are formulated as a composable, differentiable function space compatible with physics-based simulation. By integrating shape-constrained piecewise cubic Hermite interpolation (PCHIP) with gradient-based optimization, the method achieves realistic and controllable modeling of complex mechanical behaviors. The framework provides a unified interface for dynamics inference, editing, and optimization, and will be accompanied by open-sourced code and example assets.

📝 Abstract

Articulated objects used in simulation and embodied AI are typically specified by geometry and kinematic structure, but lack the fine-grained dynamical effects that govern realistic mechanical behavior, such as frictional holding, detents, soft closing, and snap latching. Existing approaches either ignore the detailed structure of dynamics entirely, or use simple models with limited expressiveness. We introduce JODA, a framework for generating joint-level dynamics as a structured three-channel field over the joint degree of freedom, capturing conservative forces, dry friction, and damping. Instantiated using shape-constrained piecewise cubic interpolation (PCHIP), this formulation defines a compact and expressive function space that is both interpretable and compatible with differentiable simulation. Building on this representation, we develop methods for inferring and refining joint dynamics from multimodal inputs. Given visual observations and joint context, a vision-language model proposes structured dynamical primitives, which are composed into a unified dynamics field. The resulting representation supports both direct manipulation and gradient-based refinement. We demonstrate that JODA enables plausible and controllable modeling of diverse joint behaviors, providing a unified interface for inference, editing, and optimization. Code and example assets with their generated profiles will be released upon publication.

Problem

Research questions and friction points this paper is trying to address.

articulated objects

joint dynamics

fine-grained dynamics

realistic mechanical behavior

dynamical modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

composable dynamics

articulated objects

differentiable simulation