Unleashing Guidance Without Classifiers for Human-Object Interaction Animation

📅 2026-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generating realistic human-object interaction animations requires joint modeling of human dynamics and object geometry, yet existing approaches often rely on handcrafted contact priors. This work proposes LIGHT, a diffusion-forcing framework that leverages modality decomposition and asynchronous denoising scheduling to generate internal guidance signals directly from the denoising rhythm itself, enabling contact-aware synthesis without external classifiers. By exploiting the temporal structure of the denoising process, LIGHT enhances invariance to object shape and significantly outperforms current classifier-free guidance methods in terms of contact fidelity, interaction realism, and generalization to unseen objects and tasks.

Technology Category

Application Category

📝 Abstract
Generating realistic human-object interaction (HOI) animations remains challenging because it requires jointly modeling dynamic human actions and diverse object geometries. Prior diffusion-based approaches often rely on hand-crafted contact priors or human-imposed kinematic constraints to improve contact quality. We propose LIGHT, a data-driven alternative in which guidance emerges from the denoising pace itself, reducing dependence on manually designed priors. Building on diffusion forcing, we factor the representation into modality-specific components and assign individualized noise levels with asynchronous denoising schedules. In this paradigm, cleaner components guide noisier ones through cross-attention, yielding guidance without auxiliary classifiers. We find that this data-driven guidance is inherently contact-aware, and can be enhanced when training is augmented with a broad spectrum of synthetic object geometries, encouraging invariance of contact semantics to geometric diversity. Extensive experiments show that pace-induced guidance more effectively mirrors the benefits of contact priors than conventional classifier-free guidance, while achieving higher contact fidelity, more realistic HOI generation, and stronger generalization to unseen objects and tasks.
Problem

Research questions and friction points this paper is trying to address.

human-object interaction
animation generation
contact modeling
object geometry
realistic animation
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion forcing
asynchronous denoising
classifier-free guidance
contact-aware generation
human-object interaction
🔎 Similar Papers
No similar papers found.
Z
Ziyin Wang
University of Illinois Urbana-Champaign
Sirui Xu
Sirui Xu
University of Illinois at Urbana-Champaign
Computer VisionMachine LearningVirtual HumansCharacter AnimationHuman-Object Interaction
C
Chuan Guo
Snap Inc.
Bing Zhou
Bing Zhou
Snap Research
Human Motion GenerationVideo GenerationHuman Computer Interaction
J
Jiangshan Gong
University of Illinois Urbana-Champaign
Jian Wang
Jian Wang
Snap Inc.
Computer visionsignal processing
Y
Yu-Xiong Wang
University of Illinois Urbana-Champaign
L
Liang-Yan Gui
University of Illinois Urbana-Champaign