Flow Matching with Injected Noise for Offline-to-Online Reinforcement Learning

📅 2026-02-20

📈 Citations: 1

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the challenge that offline reinforcement learning policies often struggle to adapt during online fine-tuning due to insufficient exploration. To bridge the gap between offline pretraining and online adaptation, the authors propose FINO, a novel approach that integrates flow matching generative models with action-space noise injection and an entropy-guided sampling mechanism. This combination explicitly enhances exploratory behavior within the pretrained policy, enabling efficient fine-tuning under limited online interaction budgets. By promoting structured exploration while preserving learned offline knowledge, FINO significantly improves sample efficiency. Empirical evaluations across diverse and complex tasks consistently demonstrate that FINO outperforms existing state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Generative models have recently demonstrated remarkable success across diverse domains, motivating their adoption as expressive policies in reinforcement learning (RL). While they have shown strong performance in offline RL, particularly where the target distribution is well defined, their extension to online fine-tuning has largely been treated as a direct continuation of offline pre-training, leaving key challenges unaddressed. In this paper, we propose Flow Matching with Injected Noise for Offline-to-Online RL (FINO), a novel method that leverages flow matching-based policies to enhance sample efficiency for offline-to-online RL. FINO facilitates effective exploration by injecting noise into policy training, thereby encouraging a broader range of actions beyond those observed in the offline dataset. In addition to exploration-enhanced flow policy training, we combine an entropy-guided sampling mechanism to balance exploration and exploitation, allowing the policy to adapt its behavior throughout online fine-tuning. Experiments across diverse, challenging tasks demonstrate that FINO consistently achieves superior performance under limited online budgets.

Problem

Research questions and friction points this paper is trying to address.

offline-to-online reinforcement learning

sample efficiency

exploration

policy adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

flow matching

injected noise

offline-to-online reinforcement learning

entropy-guided sampling