๐ค AI Summary
This work addresses the challenge of reinforcement learning agents failing in real-world non-stationary environments due to shifting reward functions or dynamically expanding action spaces. To tackle this, the authors propose MORPHIN, a novel framework that integrates concept drift detection with Q-learning for the first time. MORPHIN enables online continual learning without full retraining by adaptively adjusting exploration hyperparameters, dynamically expanding the action space, and preserving historical policy knowledge to mitigate catastrophic forgetting. Evaluated in Gridworld and traffic signal control simulations, MORPHIN achieves up to a 1.7ร faster convergence compared to standard Q-learning, significantly enhancing agent adaptability in non-stationary settings.
๐ Abstract
Reinforcement Learning (RL) agents often struggle in real-world applications where environmental conditions are non-stationary, particularly when reward functions shift or the available action space expands. This paper introduces MORPHIN, a self-adaptive Q-learning framework that enables on-the-fly adaptation without full retraining. By integrating concept drift detection with dynamic adjustments to learning and exploration hyperparameters, MORPHIN adapts agents to changes in both the reward function and on-the-fly expansions of the agent's action space, while preserving prior policy knowledge to prevent catastrophic forgetting. We validate our approach using a Gridworld benchmark and a traffic signal control simulation. The results demonstrate that MORPHIN achieves superior convergence speed and continuous adaptation compared to a standard Q-learning baseline, improving learning efficiency by up to 1.7x.