FreeAction: Training-Free Techniques for Enhanced Fidelity of Trajectory-to-Video Generation

📅 2025-09-28

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work addresses the problem of action-visual inconsistency in robot video generation, where action vectors are used only as passive conditioning signals. To resolve this, we propose an action-aware diffusion inference framework that requires no additional training. Our method comprises two key components: (1) classifier-free guidance with action-scaled guidance weights, dynamically modulating denoising strength; and (2) action-driven Gaussian latent initialization and noise truncation, explicitly modeling the temporal influence of action trajectories on the generative process. Crucially, our approach enables the first differentiable, active control of diffusion models by action parameters *during inference*, without architectural or training modifications. Experiments on real-world robot manipulation datasets demonstrate significant improvements in motion coherence and visual fidelity. The framework is broadly applicable to trajectory-to-video synthesis across diverse robotic scenarios.

Technology Category

Application Category

📝 Abstract

Generating realistic robot videos from explicit action trajectories is a critical step toward building effective world models and robotics foundation models. We introduce two training-free, inference-time techniques that fully exploit explicit action parameters in diffusion-based robot video generation. Instead of treating action vectors as passive conditioning signals, our methods actively incorporate them to guide both the classifier-free guidance process and the initialization of Gaussian latents. First, action-scaled classifier-free guidance dynamically modulates guidance strength in proportion to action magnitude, enhancing controllability over motion intensity. Second, action-scaled noise truncation adjusts the distribution of initially sampled noise to better align with the desired motion dynamics. Experiments on real robot manipulation datasets demonstrate that these techniques significantly improve action coherence and visual quality across diverse robot environments.

Problem

Research questions and friction points this paper is trying to address.

Enhancing video generation fidelity from action trajectories

Improving action coherence in robot video synthesis

Developing training-free techniques for motion-guided generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Action-scaled guidance modulates motion intensity dynamically

Action-scaled noise truncation aligns with motion dynamics

Training-free techniques enhance diffusion-based video generation

🔎 Similar Papers

Tora: Trajectory-oriented Diffusion Transformer for Video Generation