MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training

๐Ÿ“… 2024-06-04
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 2
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address weak controllability, low generation quality, slow inference, and variable-length alignment challenges in text-to-motion synthesis, this paper proposes a unified framework. First, learnable activation variables are introduced to enable text-length-adaptive motion sequence generation. Second, an adversarially enhanced latent diffusion model (LDM) is constructed, incorporating Wasserstein adversarial training to improve motion realism. Third, a training-free classifier-free guidance mechanism is designed to support diverse motion editingโ€”including start/end positions and pelvis trajectories. Built upon a joint VAE-LDM architecture, the method enables versatile control without additional fine-tuning. Experiments demonstrate significant improvements: 21.3% reduction in FID (indicating higher fidelity) and 3.2ร— faster inference speed. Notably, this is the first single-model approach to simultaneously achieve variable-length alignment, strong-constraint editing, and high-fidelity motion synthesis.

Technology Category

Application Category

๐Ÿ“ Abstract
In text-to-motion generation, controllability as well as generation quality and speed has become increasingly critical. The controllability challenges include generating a motion of a length that matches the given textual description and editing the generated motions according to control signals, such as the start-end positions and the pelvis trajectory. In this paper, we propose MoLA, which provides fast, high-quality, variable-length motion generation and can also deal with multiple editing tasks in a single framework. Our approach revisits the motion representation used as inputs and outputs in the model, incorporating an activation variable to enable variable-length motion generation. Additionally, we integrate a variational autoencoder and a latent diffusion model, further enhanced through adversarial training, to achieve high-quality and fast generation. Moreover, we apply a training-free guided generation framework to achieve various editing tasks with motion control inputs. We quantitatively show the effectiveness of adversarial learning in text-to-motion generation, and demonstrate the applicability of our editing framework to multiple editing tasks in the motion domain.
Problem

Research questions and friction points this paper is trying to address.

Enhances text-to-motion generation controllability.
Improves motion generation quality and speed.
Facilitates multiple motion editing tasks efficiently.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Variable-length motion generation via activation variable
Latent diffusion model enhanced by adversarial training
Training-free guided generation for motion editing
๐Ÿ”Ž Similar Papers
No similar papers found.