🤖 AI Summary
This work addresses the trade-off between stability and timeliness introduced by target networks in reinforcement learning by proposing a novel framework based on a target alignment mechanism. The method evaluates the consistency of Q-values between the online and target networks and prioritizes the replay of experience samples exhibiting high alignment, thereby enhancing the recency of learning signals while preserving training stability. Theoretical analysis demonstrates that this strategy accelerates convergence, and empirical evaluations across multiple benchmark environments show significant performance improvements over standard algorithms, confirming both its effectiveness and generalization capability.
📝 Abstract
Many reinforcement learning algorithms rely on target networks - lagged copies of the online network - to stabilize training. While effective, this mechanism introduces a fundamental stability-recency tradeoff: slower target updates improve stability but reduce the recency of learning signals, hindering convergence speed. We propose Target-Aligned Reinforcement Learning (TARL), a framework that emphasizes transitions for which the target and online network estimates are highly aligned. By focusing updates on well-aligned targets, TARL mitigates the adverse effects of stale target estimates while retaining the stabilizing benefits of target networks. We provide a theoretical analysis demonstrating that target alignment correction accelerates convergence, and empirically demonstrate consistent improvements over standard reinforcement learning algorithms across various benchmark environments.