Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach

📅 2025-05-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

238K/year
🤖 AI Summary
Offline reinforcement learning (RL) suffers from suboptimal policies and biased value estimation due to the absence of environmental interaction. To address this, we propose an interactive world model framework grounded in natural videos—marking the first approach to leverage large-scale, unlabeled online video data as a prior knowledge source, without requiring annotations or domain alignment, thereby enabling cross-domain transfer of control commonsense and physical dynamics to target tasks. Our method integrates generative video modeling, implicit dynamics learning, model-guided policy distillation, and offline policy optimization. Evaluated on visual-motor control tasks—including robotic manipulation, autonomous driving, and open-world video games—it achieves over 100% average performance gain over state-of-the-art offline RL methods. Our core contributions are: (i) establishing a novel video-driven paradigm for world model construction, and (ii) realizing an end-to-end transfer pathway from natural video to embodied intelligent policies.

Technology Category

Application Category

📝 Abstract
Offline reinforcement learning (RL) enables policy optimization in static datasets, avoiding the risks and costs of real-world exploration. However, it struggles with suboptimal behavior learning and inaccurate value estimation due to the lack of environmental interaction. In this paper, we present Video-Enhanced Offline RL (VeoRL), a model-based approach that constructs an interactive world model from diverse, unlabeled video data readily available online. Leveraging model-based behavior guidance, VeoRL transfers commonsense knowledge of control policy and physical dynamics from natural videos to the RL agent within the target domain. Our method achieves substantial performance gains (exceeding 100% in some cases) across visuomotor control tasks in robotic manipulation, autonomous driving, and open-world video games.
Problem

Research questions and friction points this paper is trying to address.

Improves offline RL by using video data
Addresses suboptimal behavior in static datasets
Enhances value estimation via model-based guidance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model-based approach with unlabeled video data
Transfers knowledge from videos to RL agent
Improves performance in visuomotor control tasks
🔎 Similar Papers
No similar papers found.