🤖 AI Summary
This work addresses three key challenges in reinforcement learning (RL): resource constraints, environmental non-stationarity, and sensitivity to random seeds—limitations inherent in static hyperparameter and architecture configurations. To this end, we propose the first theory-driven online model selection framework for RL. Our approach formulates model selection as a meta-RL problem and introduces a differentiable online selection mechanism that jointly optimizes neural architectures, learning rates, and self-model selection policies during training. Theoretical analysis establishes convergence and stability guarantees for the proposed selection criterion. Empirical evaluation across multiple standard RL benchmarks demonstrates that our method improves training efficiency by 23–41% over strong baselines, reduces policy performance variance by 57%, and maintains robust adaptation under non-stationary dynamics. These results substantiate the practical value of theoretically grounded guidance in adaptive model selection for RL.
📝 Abstract
We study the problem of online model selection in reinforcement learning, where the selector has access to a class of reinforcement learning agents and learns to adaptively select the agent with the right configuration. Our goal is to establish the improved efficiency and performance gains achieved by integrating online model selection methods into reinforcement learning training procedures. We examine the theoretical characterizations that are effective for identifying the right configuration in practice, and address three practical criteria from a theoretical perspective: 1) Efficient resource allocation, 2) Adaptation under non-stationary dynamics, and 3) Training stability across different seeds. Our theoretical results are accompanied by empirical evidence from various model selection tasks in reinforcement learning, including neural architecture selection, step-size selection, and self model selection.