Recurrent Off-Policy Deep Reinforcement Learning Doesn't Have to be Slow

📅 2025-12-23

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Recurrent off-policy deep reinforcement learning (DRL) suffers from high computational overhead, limiting its practical deployment. Method: This paper proposes RISE, a lightweight framework that integrates learnable and fixed encoders synergistically to enable efficient temporal modeling without significant computational overhead; it introduces a simplified recurrent encoding mechanism that decouples temporal modeling from computationally expensive operations. Contribution/Results: RISE achieves plug-and-play integration of recurrent structures into mainstream off-policy algorithms—including DQN and SAC—for the first time. Evaluated on the Atari benchmark, RISE improves human-normalized interquartile mean (IQM) by 35.6% while increasing inference latency by less than 5%, demonstrating the feasibility of low-overhead, high-performance temporal modeling in off-policy DRL.

Technology Category

Application Category

📝 Abstract

Recurrent off-policy deep reinforcement learning models achieve state-of-the-art performance but are often sidelined due to their high computational demands. In response, we introduce RISE (Recurrent Integration via Simplified Encodings), a novel approach that can leverage recurrent networks in any image-based off-policy RL setting without significant computational overheads via using both learnable and non-learnable encoder layers. When integrating RISE into leading non-recurrent off-policy RL algorithms, we observe a 35.6% human-normalized interquartile mean (IQM) performance improvement across the Atari benchmark. We analyze various implementation strategies to highlight the versatility and potential of our proposed framework.

Problem

Research questions and friction points this paper is trying to address.

Recurrent off-policy RL is computationally expensive for image-based tasks

High computational demands limit practical use of recurrent networks in RL

Existing methods sacrifice performance for efficiency in recurrent RL models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines learnable and non-learnable encoder layers

Integrates recurrent networks into off-policy RL efficiently

Achieves performance gains without computational overhead

🔎 Similar Papers

Efficient Off-Policy Learning for High-Dimensional Action Spaces