Adapting the Behavior of Reinforcement Learning Agents to Changing Action Spaces and Reward Functions

๐Ÿ“… 2025-09-29
๐Ÿ›๏ธ 2025 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of reinforcement learning agents failing in real-world non-stationary environments due to shifting reward functions or dynamically expanding action spaces. To tackle this, the authors propose MORPHIN, a novel framework that integrates concept drift detection with Q-learning for the first time. MORPHIN enables online continual learning without full retraining by adaptively adjusting exploration hyperparameters, dynamically expanding the action space, and preserving historical policy knowledge to mitigate catastrophic forgetting. Evaluated in Gridworld and traffic signal control simulations, MORPHIN achieves up to a 1.7ร— faster convergence compared to standard Q-learning, significantly enhancing agent adaptability in non-stationary settings.

Technology Category

Application Category

๐Ÿ“ Abstract
Reinforcement Learning (RL) agents often struggle in real-world applications where environmental conditions are non-stationary, particularly when reward functions shift or the available action space expands. This paper introduces MORPHIN, a self-adaptive Q-learning framework that enables on-the-fly adaptation without full retraining. By integrating concept drift detection with dynamic adjustments to learning and exploration hyperparameters, MORPHIN adapts agents to changes in both the reward function and on-the-fly expansions of the agent's action space, while preserving prior policy knowledge to prevent catastrophic forgetting. We validate our approach using a Gridworld benchmark and a traffic signal control simulation. The results demonstrate that MORPHIN achieves superior convergence speed and continuous adaptation compared to a standard Q-learning baseline, improving learning efficiency by up to 1.7x.
Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning
non-stationary environments
reward function shift
action space expansion
catastrophic forgetting
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-adaptive Q-learning
concept drift detection
dynamic action space
catastrophic forgetting prevention
non-stationary environments