DailyArt: Discovering Articulation from Single Static Images via Latent Dynamics

📅 2026-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Inferring the articulated kinematic structure of an object from a single static, closed-state image is highly challenging due to frequent occlusion of critical motion cues. This work proposes a synthesis-guided inference framework that explicitly reveals joint information by generating a maximally unfolded synthetic configuration from the same viewpoint and then end-to-end estimates the complete set of joint parameters based on the discrepancy between the observed and synthetic states. The method requires neither multi-state observations, object templates, multi-view inputs, nor explicit part annotations, yet simultaneously recovers all articulation joints and enables part-level novel pose image synthesis conditioned on the estimated joints. Experiments demonstrate that the approach achieves strong performance in both joint estimation and controllable image generation tasks.
📝 Abstract
Articulated objects are essential for embodied AI and world models, yet inferring their kinematics from a single closed-state image remains challenging because crucial motion cues are often occluded. Existing methods either require multi-state observations or rely on explicit part priors, retrieval, or other auxiliary inputs that partially expose the structure to be inferred. In this work, we present DailyArt, which formulates articulated joint estimation from a single static image as a synthesis-mediated reasoning problem. Instead of directly regressing joints from a heavily occluded observation, DailyArt first synthesizes a maximally articulated opened state under the same camera view to expose articulation cues, and then estimates the full set of joint parameters from the discrepancy between the observed and synthesized states. Using a set-prediction formulation, DailyArt recovers all joints simultaneously without requiring object-specific templates, multi-view inputs, or explicit part annotations at test time. Taking estimated joints as conditions, the framework further supports part-level novel state synthesis as a downstream capability. Extensive experiments show that DailyArt achieves strong performance in articulated joint estimation and supports part-level novel state synthesis conditioned on joints. Project page is available at https://rangooo123.github.io/DaliyArt.github.io/.
Problem

Research questions and friction points this paper is trying to address.

articulated objects
kinematics inference
single image
occlusion
joint estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

articulated object
single-image inference
latent dynamics
synthesis-mediated reasoning
joint estimation
🔎 Similar Papers
No similar papers found.