Exploration with Foundation Models: Capabilities, Limitations, and Hybrid Approaches

📅 2025-09-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In sparse-reward reinforcement learning (RL), exploration efficiency is low, while the zero-shot exploratory capability of foundation models (e.g., vision-language models, VLMs; large language models, LLMs) remains unclear. This paper proposes a hybrid exploration framework: leveraging VLMs to extract high-level semantic goals as guidance signals, integrated with an online policy-mixing mechanism. We conduct zero-shot evaluation across multi-armed bandits, grid worlds, and sparse-reward Atari environments. Experiments show that VLMs significantly improve early-sample efficiency but fail to replace policy networks for fine-grained action control—revealing a “knowledge–action gap.” Our core contribution is the first systematic characterization of foundation models’ role in RL exploration: they serve effectively as lightweight, interpretable guidance modules—not end-to-end controllers—and the hybrid paradigm demonstrates efficacy and delineates applicability conditions under low-data regimes.

Technology Category

Application Category

📝 Abstract
Exploration in reinforcement learning (RL) remains challenging, particularly in sparse-reward settings. While foundation models possess strong semantic priors, their capabilities as zero-shot exploration agents in classic RL benchmarks are not well understood. We benchmark LLMs and VLMs on multi-armed bandits, Gridworlds, and sparse-reward Atari to test zero-shot exploration. Our investigation reveals a key limitation: while VLMs can infer high-level objectives from visual input, they consistently fail at precise low-level control: the "knowing-doing gap". To analyze a potential bridge for this gap, we investigate a simple on-policy hybrid framework in a controlled, best-case scenario. Our results in this idealized setting show that VLM guidance can significantly improve early-stage sample efficiency, providing a clear analysis of the potential and constraints of using foundation models to guide exploration rather than for end-to-end control.
Problem

Research questions and friction points this paper is trying to address.

Benchmarking foundation models as zero-shot exploration agents in RL
Analyzing the knowing-doing gap between semantic inference and precise control
Investigating hybrid approaches to improve early-stage sample efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmarked LLMs/VLMs on zero-shot exploration tasks
Identified VLM limitations in low-level control tasks
Proposed hybrid framework combining VLM guidance with RL
Remo Sasso
Remo Sasso
PhD student, Queen Mary University of London
Artificial IntelligenceMachine LearningReinforcement Learning
M
Michelangelo Conserva
School of Electronic Engineering and Computer Science, Queen Mary University of London, United Kingdom
D
Dominik Jeurissen
School of Electronic Engineering and Computer Science, Queen Mary University of London, United Kingdom
P
Paulo Rauber
School of Electronic Engineering and Computer Science, Queen Mary University of London, United Kingdom