VLM-Guided Experience Replay

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of conventional experience replay buffers in reinforcement learning, which lack semantic-aware prioritization and thereby constrain sample efficiency and policy performance. The authors propose a novel approach that integrates a frozen pre-trained vision-language model (VLM) into experience replay to automatically assess the semantic value of agent interaction sub-trajectories, enabling semantic-driven prioritized sampling without fine-tuning. This method maintains high interpretability while significantly enhancing generalization across diverse tasks. Evaluated on a range of discrete and continuous control benchmarks—including gaming and robotic manipulation—it achieves an average success rate improvement of 11–52% and boosts sample efficiency by 19–45% compared to standard baselines.

Technology Category

Application Category

📝 Abstract
Recent advances in Large Language Models (LLMs) and Vision-Language Models (VLMs) have enabled powerful semantic and multimodal reasoning capabilities, creating new opportunities to enhance sample efficiency, high-level planning, and interpretability in reinforcement learning (RL). While prior work has integrated LLMs and VLMs into various components of RL, the replay buffer, a core component for storing and reusing experiences, remains unexplored. We propose addressing this gap by leveraging VLMs to guide the prioritization of experiences in the replay buffer. Our key idea is to use a frozen, pre-trained VLM (requiring no fine-tuning) as an automated evaluator to identify and prioritize promising sub-trajectories from the agent's experiences. Across scenarios, including game-playing and robotics, spanning both discrete and continuous domains, agents trained with our proposed prioritization method achieve 11-52% higher average success rates and improve sample efficiency by 19-45% compared to previous approaches. https://esharony.me/projects/vlm-rb/
Problem

Research questions and friction points this paper is trying to address.

reinforcement learning
replay buffer
Vision-Language Models
experience prioritization
sample efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language Models
Experience Replay
Reinforcement Learning
Sample Efficiency
Trajectory Prioritization
🔎 Similar Papers
No similar papers found.