ACG: Action Coherence Guidance for Flow-based VLA models

📅 2025-10-25

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Action noise—such as jitter and pauses—in human demonstrations degrades trajectory coherence in flow-matching-based vision-language-action (VLA) models, leading to deployment instability and failure in fine-grained manipulation. To address this, we propose a **training-free, test-time action coherence guidance method** that dynamically refines action sequences during inference to enhance smoothness and temporal consistency, significantly improving robustness to demonstration noise. Our approach is framework-agnostic, seamlessly integrating with both diffusion and flow-matching VLA architectures without introducing additional parameters or training overhead. We evaluate it on RoboCasa, DexMimicGen, and real-world SO-101 tasks, demonstrating substantial improvements in action coherence metrics and task success rates. The method provides a lightweight, general-purpose, plug-and-play stability enhancement for practical VLA deployment.

Technology Category

Application Category

📝 Abstract

Diffusion and flow matching models have emerged as powerful robot policies, enabling Vision-Language-Action (VLA) models to generalize across diverse scenes and instructions. Yet, when trained via imitation learning, their high generative capacity makes them sensitive to noise in human demonstrations: jerks, pauses, and jitter which reduce action coherence. Reduced action coherence causes instability and trajectory drift during deployment, failures that are catastrophic in fine-grained manipulation where precision is crucial. In this paper, we present Action Coherence Guidance (ACG) for VLA models, a training-free test-time guidance algorithm that improves action coherence and thereby yields performance gains. Evaluated on RoboCasa, DexMimicGen, and real-world SO-101 tasks, ACG consistently improves action coherence and boosts success rates across diverse manipulation tasks. Code and project page are available at https://github.com/DAVIAN-Robotics/ACG and https://DAVIAN-Robotics.github.io/ACG , respectively.

Problem

Research questions and friction points this paper is trying to address.

Improving action coherence in vision-language-action models

Reducing trajectory drift during robotic manipulation deployment

Mitigating noise sensitivity in imitation learning policies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free test-time guidance algorithm for VLA models

Improves action coherence in vision-language-action policies

Reduces trajectory drift and boosts manipulation success rates

🔎 Similar Papers

No similar papers found.