Adapting the Behavior of Reinforcement Learning Agents to Changing Action Spaces and Reward Functions

📅 2025-09-29

🏛️ 2025 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This work addresses the challenge of reinforcement learning agents failing in real-world non-stationary environments due to shifting reward functions or dynamically expanding action spaces. To tackle this, the authors propose MORPHIN, a novel framework that integrates concept drift detection with Q-learning for the first time. MORPHIN enables online continual learning without full retraining by adaptively adjusting exploration hyperparameters, dynamically expanding the action space, and preserving historical policy knowledge to mitigate catastrophic forgetting. Evaluated in Gridworld and traffic signal control simulations, MORPHIN achieves up to a 1.7× faster convergence compared to standard Q-learning, significantly enhancing agent adaptability in non-stationary settings.

Technology Category

Application Category

📝 Abstract

Reinforcement Learning (RL) agents often struggle in real-world applications where environmental conditions are non-stationary, particularly when reward functions shift or the available action space expands. This paper introduces MORPHIN, a self-adaptive Q-learning framework that enables on-the-fly adaptation without full retraining. By integrating concept drift detection with dynamic adjustments to learning and exploration hyperparameters, MORPHIN adapts agents to changes in both the reward function and on-the-fly expansions of the agent's action space, while preserving prior policy knowledge to prevent catastrophic forgetting. We validate our approach using a Gridworld benchmark and a traffic signal control simulation. The results demonstrate that MORPHIN achieves superior convergence speed and continuous adaptation compared to a standard Q-learning baseline, improving learning efficiency by up to 1.7x.

Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning

non-stationary environments

reward function shift

action space expansion

catastrophic forgetting

Innovation

Methods, ideas, or system contributions that make the work stand out.

self-adaptive Q-learning

concept drift detection

dynamic action space