$π$-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses the challenge of deploying flow-based vision-language-action (VLA) models in online reinforcement learning, where intractable likelihoods during multi-step sampling hinder effective training. The authors propose π-StepNFT, a novel framework that enables the first likelihood-free, value-network-free online training of flow-based VLA policies. By performing policy updates via a single forward pass per step and incorporating a progressive negative perceptual fine-tuning mechanism, the method achieves fine-grained policy alignment across broad action spaces. Evaluated on the LIBERO benchmark, π-StepNFT demonstrates strong few-shot robustness and significantly outperforms value-based baselines in out-of-distribution scenarios from ManiSkill, effectively mitigating multimodal feature overfitting.

Technology Category

Application Category

📝 Abstract

Flow-based vision-language-action (VLA) models excel in embodied control but suffer from intractable likelihoods during multi-step sampling, hindering online reinforcement learning. We propose \textbf{\textit{$\boldsymbolπ$-StepNFT}} (Step-wise Negative-aware Fine-Tuning), a critic-and-likelihood-free framework that requires only a single forward pass per optimization step and eliminates auxiliary value networks. We identify that wider exploration spaces necessitate finer-grained, step-wise guidance for alignment. Empirically, $π$-StepNFT unlocks latent potential on LIBERO with competitive few-shot robustness. Moreover, it achieves superior generalization on ManiSkill, outperforming value-based baselines in OOD scenarios by preventing overfitting to multimodal features. This property offers a scalable solution promising for complex real-world applications.

Problem

Research questions and friction points this paper is trying to address.

flow-based VLA

online reinforcement learning

intractable likelihoods

multi-step sampling

exploration space

Innovation

Methods, ideas, or system contributions that make the work stand out.

Flow-based VLA

Online Reinforcement Learning

Step-wise Fine-Tuning