Improved Training Mechanism for Reinforcement Learning via Online Model Selection

📅 2025-12-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses three key challenges in reinforcement learning (RL): resource constraints, environmental non-stationarity, and sensitivity to random seeds—limitations inherent in static hyperparameter and architecture configurations. To this end, we propose the first theory-driven online model selection framework for RL. Our approach formulates model selection as a meta-RL problem and introduces a differentiable online selection mechanism that jointly optimizes neural architectures, learning rates, and self-model selection policies during training. Theoretical analysis establishes convergence and stability guarantees for the proposed selection criterion. Empirical evaluation across multiple standard RL benchmarks demonstrates that our method improves training efficiency by 23–41% over strong baselines, reduces policy performance variance by 57%, and maintains robust adaptation under non-stationary dynamics. These results substantiate the practical value of theoretically grounded guidance in adaptive model selection for RL.

Technology Category

Application Category

📝 Abstract
We study the problem of online model selection in reinforcement learning, where the selector has access to a class of reinforcement learning agents and learns to adaptively select the agent with the right configuration. Our goal is to establish the improved efficiency and performance gains achieved by integrating online model selection methods into reinforcement learning training procedures. We examine the theoretical characterizations that are effective for identifying the right configuration in practice, and address three practical criteria from a theoretical perspective: 1) Efficient resource allocation, 2) Adaptation under non-stationary dynamics, and 3) Training stability across different seeds. Our theoretical results are accompanied by empirical evidence from various model selection tasks in reinforcement learning, including neural architecture selection, step-size selection, and self model selection.
Problem

Research questions and friction points this paper is trying to address.

Online model selection for reinforcement learning agents
Improving efficiency and performance via adaptive configuration selection
Addressing resource allocation, non-stationary dynamics, and training stability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online model selection for adaptive agent configuration
Theoretical criteria for efficient resource and stability
Empirical validation across diverse RL selection tasks
🔎 Similar Papers
No similar papers found.