Mitigating the Stability-Plasticity Dilemma in Adaptive Train Scheduling with Curriculum-Driven Continual DQN Expansion

📅 2024-08-19
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In dynamic multi-agent train scheduling, continual learning faces a fundamental trade-off between knowledge stability and policy plasticity. Method: This paper proposes the Curriculum-Driven Extended DQN (CDE) framework, which innovatively integrates Elastic Weight Consolidation (EWC) regularization with an adaptive rational activation function to dynamically construct Q-function subspaces—thereby mitigating catastrophic forgetting while enhancing environmental adaptability. Distinct from conventional continual reinforcement learning approaches, CDE incorporates a curriculum learning mechanism that progressively structures task sequences to optimize policy transfer. Contribution/Results: Evaluated on real-world train scheduling scenarios, CDE achieves significant improvements in cross-task generalization and sample efficiency, consistently outperforming state-of-the-art RL and continual learning baselines. The framework establishes a scalable, robust paradigm for adaptive scheduling in complex transportation systems.

Technology Category

Application Category

📝 Abstract
A continual learning agent builds on previous experiences to develop increasingly complex behaviors by adapting to non-stationary and dynamic environments while preserving previously acquired knowledge. However, scaling these systems presents significant challenges, particularly in balancing the preservation of previous policies with the adaptation of new ones to current environments. This balance, known as the stability-plasticity dilemma, is especially pronounced in complex multi-agent domains such as the train scheduling problem, where environmental and agent behaviors are constantly changing, and the search space is vast. In this work, we propose addressing these challenges in the train scheduling problem using curriculum learning. We design a curriculum with adjacent skills that build on each other to improve generalization performance. Introducing a curriculum with distinct tasks introduces non-stationarity, which we address by proposing a new algorithm: Continual Deep Q-Network (DQN) Expansion (CDE). Our approach dynamically generates and adjusts Q-function subspaces to handle environmental changes and task requirements. CDE mitigates catastrophic forgetting through EWC while ensuring high plasticity using adaptive rational activation functions. Experimental results demonstrate significant improvements in learning efficiency and adaptability compared to RL baselines and other adapted methods for continual learning, highlighting the potential of our method in managing the stability-plasticity dilemma in the adaptive train scheduling setting.
Problem

Research questions and friction points this paper is trying to address.

Balancing stability and plasticity in continual learning
Addressing non-stationarity in dynamic train scheduling environments
Mitigating catastrophic forgetting while adapting to new tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Curriculum-driven continual DQN expansion
Dynamic Q-function subspaces adjustment
EWC and adaptive activation functions
🔎 Similar Papers
No similar papers found.