π€ AI Summary
This work addresses the challenge of generating high-fidelity, natural, and diverse motion sequences from extremely sparse keyframesβa scenario where existing methods struggle to simultaneously ensure accuracy, temporal continuity, and variation. The authors propose a novel framework that integrates Implicit Neural Representations (INRs) with Latent Diffusion Models (LDMs), introducing continuous implicit representations into the diffusion generation paradigm for the first time. By sampling INR parameters under keyframe constraints, the method reconstructs plausible intermediate motions directly from minimal input. This approach significantly enhances generation quality in sparse keyframe settings, faithfully adhering to the given keyframes while preserving smoothness and semantic coherence throughout the synthesized motion sequence.
π Abstract
Recent advances in generative models have yielded impressive progress on motion in-betweening, allowing for more complex, varied, and realistic motion transitions. However, recent methods still exhibit noticeable limitations in preserving keyframe information and ensuring motion continuity. In this paper, we propose a novel pipeline and sampling optimization strategy for latent diffusion models (LDM) based on motion implicit neural representations (INR). By establishing a mapping between INR and sparse spatial or temporal information within latent diffusion, our model can sample the INR parameters from extremely sparse and ambiguous keyframe data and reconstruct plausible and smooth motions from the manifold. Our experiments demonstrate the superior performance of our model, which significantly improves motion generation quality in scenarios with few keyframes while ensuring both keyframe accuracy and diversity of in-between motions.