🤖 AI Summary
This work addresses the challenges in dynamic optimization problems, where environmental changes are difficult to detect effectively and existing evolutionary algorithms rely on manually designed adaptation strategies with limited generalization. To overcome these limitations, the paper proposes a novel two-layer framework that integrates meta black-box optimization with deep reinforcement learning. For the first time, a deep Q-network is employed to control the dynamic optimizer, enabling autonomous perception of environmental shifts and adaptive adjustment of evolutionary parameters without human intervention. The proposed method demonstrates strong cross-problem generalization and significantly outperforms state-of-the-art algorithms across multiple dynamic benchmark suites of varying difficulty, exhibiting more flexible search behavior and superior performance.
📝 Abstract
Dynamic Optimization Problems (DOPs) are challenging to address due to their complex nature, i.e., dynamic environment variation. Evolutionary Computation methods are generally advantaged in solving DOPs since they resemble dynamic biological evolution. However, existing evolutionary dynamic optimization methods rely heavily on human-crafted adaptive strategy to detect environment variation in DOPs, and then adapt the searching strategy accordingly. These hand-crafted strategies may perform ineffectively at out-of-box scenarios. In this paper, we propose a reinforcement learning-assisted approach to enable automated variation detection and self-adaption in evolutionary algorithms. This is achieved by borrowing the bi-level learning-to-optimize idea from recent Meta-Black-Box Optimization works. We use a deep Q-network as optimization dynamics detector and searching strategy adapter: It is fed as input with current-step optimization state and then dictates desired control parameters to underlying evolutionary algorithms for next-step optimization. The learning objective is to maximize the expected performance gain across a problem distribution. Once trained, our approach could generalize toward unseen DOPs with automated environment variation detection and self-adaption. To facilitate comprehensive validation, we further construct an easy-to-difficult DOPs testbed with diverse synthetic instances. Extensive benchmark results demonstrate flexible searching behavior and superior performance of our approach in solving DOPs, compared to state-of-the-art baselines.