Unleashing Guidance Without Classifiers for Human-Object Interaction Animation

📅 2026-03-26

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Generating realistic human-object interaction animations requires joint modeling of human dynamics and object geometry, yet existing approaches often rely on handcrafted contact priors. This work proposes LIGHT, a diffusion-forcing framework that leverages modality decomposition and asynchronous denoising scheduling to generate internal guidance signals directly from the denoising rhythm itself, enabling contact-aware synthesis without external classifiers. By exploiting the temporal structure of the denoising process, LIGHT enhances invariance to object shape and significantly outperforms current classifier-free guidance methods in terms of contact fidelity, interaction realism, and generalization to unseen objects and tasks.

Technology Category

Application Category

📝 Abstract

Generating realistic human-object interaction (HOI) animations remains challenging because it requires jointly modeling dynamic human actions and diverse object geometries. Prior diffusion-based approaches often rely on hand-crafted contact priors or human-imposed kinematic constraints to improve contact quality. We propose LIGHT, a data-driven alternative in which guidance emerges from the denoising pace itself, reducing dependence on manually designed priors. Building on diffusion forcing, we factor the representation into modality-specific components and assign individualized noise levels with asynchronous denoising schedules. In this paradigm, cleaner components guide noisier ones through cross-attention, yielding guidance without auxiliary classifiers. We find that this data-driven guidance is inherently contact-aware, and can be enhanced when training is augmented with a broad spectrum of synthetic object geometries, encouraging invariance of contact semantics to geometric diversity. Extensive experiments show that pace-induced guidance more effectively mirrors the benefits of contact priors than conventional classifier-free guidance, while achieving higher contact fidelity, more realistic HOI generation, and stronger generalization to unseen objects and tasks.

Problem

Research questions and friction points this paper is trying to address.

human-object interaction

animation generation

contact modeling

object geometry

realistic animation

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion forcing

asynchronous denoising

classifier-free guidance