MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training

📅 2024-06-04

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

202K/year

🤖 AI Summary

To address weak controllability, low generation quality, slow inference, and variable-length alignment challenges in text-to-motion synthesis, this paper proposes a unified framework. First, learnable activation variables are introduced to enable text-length-adaptive motion sequence generation. Second, an adversarially enhanced latent diffusion model (LDM) is constructed, incorporating Wasserstein adversarial training to improve motion realism. Third, a training-free classifier-free guidance mechanism is designed to support diverse motion editing—including start/end positions and pelvis trajectories. Built upon a joint VAE-LDM architecture, the method enables versatile control without additional fine-tuning. Experiments demonstrate significant improvements: 21.3% reduction in FID (indicating higher fidelity) and 3.2× faster inference speed. Notably, this is the first single-model approach to simultaneously achieve variable-length alignment, strong-constraint editing, and high-fidelity motion synthesis.

Technology Category

Application Category

📝 Abstract

In text-to-motion generation, controllability as well as generation quality and speed has become increasingly critical. The controllability challenges include generating a motion of a length that matches the given textual description and editing the generated motions according to control signals, such as the start-end positions and the pelvis trajectory. In this paper, we propose MoLA, which provides fast, high-quality, variable-length motion generation and can also deal with multiple editing tasks in a single framework. Our approach revisits the motion representation used as inputs and outputs in the model, incorporating an activation variable to enable variable-length motion generation. Additionally, we integrate a variational autoencoder and a latent diffusion model, further enhanced through adversarial training, to achieve high-quality and fast generation. Moreover, we apply a training-free guided generation framework to achieve various editing tasks with motion control inputs. We quantitatively show the effectiveness of adversarial learning in text-to-motion generation, and demonstrate the applicability of our editing framework to multiple editing tasks in the motion domain.

Problem

Research questions and friction points this paper is trying to address.

Enhances text-to-motion generation controllability.

Improves motion generation quality and speed.

Facilitates multiple motion editing tasks efficiently.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Variable-length motion generation via activation variable

Latent diffusion model enhanced by adversarial training

Training-free guided generation for motion editing

🔎 Similar Papers

SMCD: High Realism Motion Style Transfer via Mamba-based Diffusion