๐ค AI Summary
This work addresses the challenges of multi-person collaborative manipulation, including difficulties in motion synchronization, unnatural postures, and unstable interactions with shared objectsโissues often overlooked by existing methods that focus primarily on single-agent scenarios or neglect payload dynamics. To overcome these limitations, we propose the first unified flow-matching framework that integrates object-function-guided manipulation policies, adversarial interaction priors to enhance the realism of individual poses and interpersonal coordination, and a stability-driven simulation optimization mechanism. By combining sampling-based refinement with vector field regression, our approach significantly improves collaborative quality. Extensive evaluations demonstrate that our method outperforms state-of-the-art human-object interaction baselines in contact accuracy, interpenetration rate, and fidelity to natural motion distributions, enabling high-quality, stable, and physically plausible human-human-object collaborative motion synthesis.
๐ Abstract
Co-manipulation requires multiple humans to synchronize their motions with a shared object while ensuring reasonable interactions, maintaining natural poses, and preserving stable states. However, most existing motion generation approaches are designed for single-character scenarios or fail to account for payload-induced dynamics. In this work, we propose a flow-matching framework that ensures the generated co-manipulation motions align with the intended goals while maintaining naturalness and effectiveness. Specifically, we first introduce a generative model that derives explicit manipulation strategies from the object's affordance and spatial configuration, which guide the motion flow toward successful manipulation. To improve motion quality, we then design an adversarial interaction prior that promotes natural individual poses and realistic inter-person interactions during co-manipulation. In addition, we also incorporate a stability-driven simulation into the flow matching process, which refines unstable interaction states through sampling-based optimization and directly adjusts the vector field regression to promote more effective manipulation. The experimental results demonstrate that our method achieves higher contact accuracy, lower penetration, and better distributional fidelity compared to state-of-the-art human-object interaction baselines. The code is available at https://github.com/boycehbz/StaCOM.