Energy-Based Transfer for Reinforcement Learning

📅 2025-06-19

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

To address low sample efficiency of reinforcement learning in multi-task and continual learning settings, this paper proposes an energy-based adaptive policy transfer method. Our approach dynamically gates teacher policy intervention by linking energy scores with state visitation density theory, enabling out-of-distribution detection: the teacher guides exploration only in states it has encountered, thereby preventing cross-task bias. This mechanism mitigates exploration bias arising from mismatched teacher knowledge transfer. Evaluated on both single-task and multi-task benchmarks, our method achieves an average performance gain of 23% and accelerates convergence by 1.8×, significantly improving sample efficiency and generalization robustness.

Technology Category

Application Category

📝 Abstract

Reinforcement learning algorithms often suffer from poor sample efficiency, making them challenging to apply in multi-task or continual learning settings. Efficiency can be improved by transferring knowledge from a previously trained teacher policy to guide exploration in new but related tasks. However, if the new task sufficiently differs from the teacher's training task, the transferred guidance may be sub-optimal and bias exploration toward low-reward behaviors. We propose an energy-based transfer learning method that uses out-of-distribution detection to selectively issue guidance, enabling the teacher to intervene only in states within its training distribution. We theoretically show that energy scores reflect the teacher's state-visitation density and empirically demonstrate improved sample efficiency and performance across both single-task and multi-task settings.

Problem

Research questions and friction points this paper is trying to address.

Improving poor sample efficiency in reinforcement learning

Transferring knowledge from teacher to new related tasks

Avoiding sub-optimal guidance in differing tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Energy-based transfer learning method

Selective guidance via out-of-distribution detection

Intervention in teacher-trained states only

🔎 Similar Papers

No similar papers found.