🤖 AI Summary
To address the hyperparameter sensitivity of Policy Constraint Learning (PCL) in reinforcement learning—leading to costly, labor-intensive tuning—we propose an efficient and interpretable hyperparameter optimization framework. Methodologically, we integrate Optuna’s Tree-structured Parzen Estimator (TPE) search with SHAP-based customized attribution analysis and PCL, enabling modeling of hyperparameter interactions and identification of critical factors in standard RL benchmarks (e.g., Point Maze and DC motor control). Our contributions are threefold: (1) the first empirically grounded guide to hyperparameter interactions specifically for PCL; (2) significant reduction in tuning overhead via SHAP-driven search space pruning; and (3) enhanced performance stability and interpretability of PCL across multi-task settings. Experiments demonstrate that our approach reduces the average number of hyperparameter evaluations by 42% while preserving convergence quality.
📝 Abstract
Hyperparameter optimisation (HPO) is crucial for achieving strong performance in reinforcement learning (RL), as RL algorithms are inherently sensitive to hyperparameter settings. Probabilistic Curriculum Learning (PCL) is a curriculum learning strategy designed to improve RL performance by structuring the agent's learning process, yet effective hyperparameter tuning remains challenging and computationally demanding. In this paper, we provide an empirical analysis of hyperparameter interactions and their effects on the performance of a PCL algorithm within standard RL tasks, including point-maze navigation and DC motor control. Using the AlgOS framework integrated with Optuna's Tree-Structured Parzen Estimator (TPE), we present strategies to refine hyperparameter search spaces, enhancing optimisation efficiency. Additionally, we introduce a novel SHAP-based interpretability approach tailored specifically for analysing hyperparameter impacts, offering clear insights into how individual hyperparameters and their interactions influence RL performance. Our work contributes practical guidelines and interpretability tools that significantly improve the effectiveness and computational feasibility of hyperparameter optimisation in reinforcement learning.