๐ค AI Summary
Existing motion style transfer methods predominantly adopt dual-stream architectures, which often neglect intrinsic correlations between content and style motionsโleading to information loss, misalignment, and insufficient modeling of long-range temporal dependencies, thereby producing distorted and incoherent motions. To address these limitations, we propose SMCD, the first diffusion-based framework conditioned on style motion. We introduce the novel Motion Style Mamba (MSM) module to efficiently capture long-term sequential motion dependencies, and design a dual-content consistency loss to enhance generation stability. Our approach enables disentangled motion feature representation, conditional reconstruction, and high-fidelity style transfer. Extensive qualitative and quantitative evaluations demonstrate that SMCD consistently outperforms state-of-the-art methods, achieving significant improvements in motion realism, temporal coherence, and style fidelity. The method effectively enriches motion diversity and naturalness for virtual human avatars.
๐ Abstract
Motion style transfer is a significant research direction in multimedia applications. It enables the rapid switching of different styles of the same motion for virtual digital humans, thus vastly increasing the diversity and realism of movements. It is widely applied in multimedia scenarios such as movies, games, and the Metaverse. However, most of the current work in this field adopts the GAN, which may lead to instability and convergence issues, making the final generated motion sequence somewhat chaotic and unable to reflect a highly realistic and natural style. To address these problems, we consider style motion as a condition and propose the Style Motion Conditioned Diffusion (SMCD) framework for the first time, which can more comprehensively learn the style features of motion. Moreover, we apply Mamba model for the first time in the motion style transfer field, introducing the Motion Style Mamba (MSM) module to handle longer motion sequences. Thirdly, aiming at the SMCD framework, we propose Diffusion-based Content Consistency Loss and Content Consistency Loss to assist the overall framework's training. Finally, we conduct extensive experiments. The results reveal that our method surpasses state-of-the-art methods in both qualitative and quantitative comparisons, capable of generating more realistic motion sequences.