ARFlow: Human Action-Reaction Flow Matching with Physical Guidance

📅 2025-03-21

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work addresses two key challenges in human action–reaction synthesis: (1) over-reliance on complex conditional mechanisms in diffusion models, and (2) physically implausible outputs (e.g., body self-penetration). We propose an end-to-end action-to-reaction mapping framework. Our core contributions are: (1) the first x₁-prediction paradigm for motion generation—replacing conventional noise prediction with direct next-step pose estimation; (2) a training-free, gradient-based physical guidance mechanism that enforces real-time collision avoidance by optimizing intersection volume and frequency; and (3) integration of flow matching with differentiable physical constraints. Evaluated on NTU-120 and Chi3D, our method achieves state-of-the-art FID and diversity scores, while reducing body penetration rates significantly. Quantitative validation via Intersection Volume and Intersection Frequency confirms substantial improvements in physical plausibility.

Technology Category

Application Category

📝 Abstract

Human action-reaction synthesis, a fundamental challenge in modeling causal human interactions, plays a critical role in applications ranging from virtual reality to social robotics. While diffusion-based models have demonstrated promising performance, they exhibit two key limitations for interaction synthesis: reliance on complex noise-to-reaction generators with intricate conditional mechanisms, and frequent physical violations in generated motions. To address these issues, we propose Action-Reaction Flow Matching (ARFlow), a novel framework that establishes direct action-to-reaction mappings, eliminating the need for complex conditional mechanisms. Our approach introduces two key innovations: an x1-prediction method that directly outputs human motions instead of velocity fields, enabling explicit constraint enforcement; and a training-free, gradient-based physical guidance mechanism that effectively prevents body penetration artifacts during sampling. Extensive experiments on NTU120 and Chi3D datasets demonstrate that ARFlow not only outperforms existing methods in terms of Fr'echet Inception Distance and motion diversity but also significantly reduces body collisions, as measured by our new Intersection Volume and Intersection Frequency metrics.

Problem

Research questions and friction points this paper is trying to address.

Modeling causal human interactions for virtual reality and robotics

Overcoming limitations of diffusion-based interaction synthesis methods

Reducing physical violations and body collisions in generated motions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Direct action-to-reaction mapping framework

X1-prediction for explicit motion output

Training-free gradient-based physical guidance

🔎 Similar Papers

No similar papers found.

Nvidia

The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5. You will also be eligible for equity and benefits.

US, CA, Remote / US, WA, Remote / US, OR, Remote

Research Scientist Intern, Machine Perception for Input and Interaction (PhD)