🤖 AI Summary
This work addresses three key challenges in reinforcement learning optimization for dynamic environments: dynamically evolving state-action spaces, large problem scale, and sparse rewards. To this end, we propose a novel Decision Transformer (DT) architecture that integrates Graph Neural Networks (GNNs) to explicitly model state-action dependencies and construct adaptive, dynamic graph-structured representations. We further introduce input-output residual token connections to enhance adaptability and generalization under environmental shifts. Our method jointly leverages trajectory-based supervised learning and GNN-encoded structural priors. Evaluated on electric vehicle charging scheduling, it achieves substantial improvements in sample efficiency—reducing required training trajectories by over 50%—while demonstrating strong robustness to unseen environment distributions and expanded action spaces. It consistently outperforms existing DT baselines, establishing new state-of-the-art performance for dynamic decision-making under sparsity and structural uncertainty.
📝 Abstract
Reinforcement Learning (RL) methods used for solving real-world optimization problems often involve dynamic state-action spaces, larger scale, and sparse rewards, leading to significant challenges in convergence, scalability, and efficient exploration of the solution space. This study introduces GNN-DT, a novel Decision Transformer (DT) architecture that integrates Graph Neural Network (GNN) embedders with a novel residual connection between input and output tokens crucial for handling dynamic environments. By learning from previously collected trajectories, GNN-DT reduces dependence on accurate simulators and tackles the sparse rewards limitations of online RL algorithms. We evaluate GNN-DT on the complex electric vehicle (EV) charging optimization problem and prove that its performance is superior and requires significantly fewer training trajectories, thus improving sample efficiency compared to existing DT baselines. Furthermore, GNN-DT exhibits robust generalization to unseen environments and larger action spaces, addressing a critical gap in prior DT-based approaches