Controllable Text-to-Motion Generation via Modular Body-Part Phase Control

📅 2026-03-20

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the challenge in text-to-motion generation where localized editing of specific body parts often compromises the global coherence of the resulting motion. To this end, the authors propose a plug-and-play, modular phase control framework that models the latent motion of individual body parts as sinusoidal phase signals, represented compactly and interpretably by four scalar parameters: amplitude, frequency, phase offset, and bias. By decoupling the Phase ControlNet from the backbone diffusion or flow model and incorporating a residual feature modulation mechanism to inject control signals, the framework enables fine-grained editing of motion amplitude, speed, and timing while preserving overall motion consistency. Experiments demonstrate that the proposed method significantly enhances controllability, predictability, and generation quality in interactive motion editing scenarios.

Technology Category

Application Category

📝 Abstract

Text-to-motion (T2M) generation is becoming a practical tool for animation and interactive avatars. However, modifying specific body parts while maintaining overall motion coherence remains challenging. Existing methods typically rely on cumbersome, high-dimensional joint constraints (e.g., trajectories), which hinder user-friendly, iterative refinement. To address this, we propose Modular Body-Part Phase Control, a plug-and-play framework enabling structured, localized editing via a compact, scalar-based phase interface. By modeling body-part latent motion channels as sinusoidal phase signals characterized by amplitude, frequency, phase shift, and offset, we extract interpretable codes that capture part-specific dynamics. A modular Phase ControlNet branch then injects this signal via residual feature modulation, seamlessly decoupling control from the generative backbone. Experiments on both diffusion- and flow-based models demonstrate that our approach provides predictable and fine-grained control over motion magnitude, speed, and timing. It preserves global motion coherence and offers a practical paradigm for controllable T2M generation. Project page: https://jixiii.github.io/bp-phase-project-page/

Problem

Research questions and friction points this paper is trying to address.

Text-to-Motion

motion control

body-part editing

motion coherence

controllable generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular Body-Part Phase Control

Text-to-Motion Generation

Phase-based Motion Representation