Reinforcement Learning in Vision: A Survey

📅 2025-08-11

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

This survey systematically reviews the state of visual reinforcement learning (Visual RL), addressing the core challenge of tightly integrating visual perception with sequential decision-making. We formalize the Visual RL problem and trace methodological evolution from RLHF to verifiable reward modeling, highlighting emerging paradigms—including curriculum learning, preference-aligned diffusion, and unified reward modeling. Methodologically, we establish a technical framework grounded in multimodal large models, visual generation, unified architectures, and vision-language-action models, integrating policy optimization techniques (e.g., PPO, Group Relative Policy Optimization) with multimodal alignment and diffusion-based modeling. Our contribution includes a comprehensive analysis of 200+ representative works, the first fine-grained taxonomy for Visual RL, and an open-source benchmarking repository. We identify sample efficiency, cross-task generalization, and safe deployment as critical open challenges for future research.

Technology Category

Application Category

📝 Abstract

Recent advances at the intersection of reinforcement learning (RL) and visual intelligence have enabled agents that not only perceive complex visual scenes but also reason, generate, and act within them. This survey offers a critical and up-to-date synthesis of the field. We first formalize visual RL problems and trace the evolution of policy-optimization strategies from RLHF to verifiable reward paradigms, and from Proximal Policy Optimization to Group Relative Policy Optimization. We then organize more than 200 representative works into four thematic pillars: multi-modal large language models, visual generation, unified model frameworks, and vision-language-action models. For each pillar we examine algorithmic design, reward engineering, benchmark progress, and we distill trends such as curriculum-driven training, preference-aligned diffusion, and unified reward modeling. Finally, we review evaluation protocols spanning set-level fidelity, sample-level preference, and state-level stability, and we identify open challenges that include sample efficiency, generalization, and safe deployment. Our goal is to provide researchers and practitioners with a coherent map of the rapidly expanding landscape of visual RL and to highlight promising directions for future inquiry. Resources are available at: https://github.com/weijiawu/Awesome-Visual-Reinforcement-Learning.

Problem

Research questions and friction points this paper is trying to address.

Surveying reinforcement learning and visual intelligence integration advances

Organizing 200+ works into four thematic pillars analysis

Identifying evaluation protocols and open challenges in visual RL

Innovation

Methods, ideas, or system contributions that make the work stand out.

Policy optimization from RLHF to verifiable rewards

Unified model frameworks and vision-language-action models

Curriculum training and preference-aligned diffusion methods

🔎 Similar Papers

No similar papers found.