🤖 AI Summary
Existing group motion generation methods struggle to simultaneously achieve scalability, physical plausibility, and fine-grained control: incremental diffusion models based on shared prompts often oversimplify interactions and lack explicit modeling of orientation, velocity, and spatial relationships. This paper proposes a training-free noise optimization framework that decomposes complex group motion into semantically coherent pairwise interaction sequences, leveraging a pre-trained two-person diffusion model for stepwise synthesis. Physical constraints—such as collision avoidance—are incorporated to suppress interpenetration artifacts and enable multi-dimensional motion customization (e.g., speed, direction, spacing). Extensive evaluation across diverse scenarios demonstrates substantial improvements in visual realism, physical plausibility, and controllability for long-horizon, large-group interactions. The approach achieves high-quality, scalable, and fine-tuning-free group motion generation without requiring additional model training.
📝 Abstract
Generating realistic group interactions involving multiple characters remains challenging due to increasing complexity as group size expands. While existing conditional diffusion models incrementally generate motions by conditioning on previously generated characters, they rely on single shared prompts, limiting nuanced control and leading to overly simplified interactions. In this paper, we introduce Person-Interaction Noise Optimization (PINO), a novel, training-free framework designed for generating realistic and customizable interactions among groups of arbitrary size. PINO decomposes complex group interactions into semantically relevant pairwise interactions, and leverages pretrained two-person interaction diffusion models to incrementally compose group interactions. To ensure physical plausibility and avoid common artifacts such as overlapping or penetration between characters, PINO employs physics-based penalties during noise optimization. This approach allows precise user control over character orientation, speed, and spatial relationships without additional training. Comprehensive evaluations demonstrate that PINO generates visually realistic, physically coherent, and adaptable multi-person interactions suitable for diverse animation, gaming, and robotics applications.