🤖 AI Summary
To address the reliance on manual expertise, low efficiency, and poor consistency in high-dose-rate (HDR) intracavitary brachytherapy planning for locally advanced cervical cancer, this paper proposes the first fully automated treatment planning framework integrating deep reinforcement learning (DRL) with dose optimization. The method employs a novel hierarchical two-stage paradigm: Stage I utilizes a Deep Q-Network (DQN) agent to dynamically learn clinical trade-off parameters, jointly modeling multi-organ dose–volume histograms (DVHs) as states and rewards; Stage II performs dwell-time optimization via an adaptive Adam optimizer, leveraging treatment-planning-parameter (TPP) parameterization and 3D dose calculation. Evaluated on a cohort of patients with complex applicators, the framework achieves a mean clinical plan score of 93.89% on the test set—significantly outperforming manual planning (91.86%)—while ensuring complete target coverage and reducing CTV hot-spot incidence.
📝 Abstract
High-dose-rate (HDR) brachytherapy plays a critical role in the treatment of locally advanced cervical cancer but remains highly dependent on manual treatment planning expertise. The objective of this study is to develop a fully automated HDR brachytherapy planning framework that integrates reinforcement learning (RL) and dose-based optimization to generate clinically acceptable treatment plans with improved consistency and efficiency. We propose a hierarchical two-stage autoplanning framework. In the first stage, a deep Q-network (DQN)-based RL agent iteratively selects treatment planning parameters (TPPs), which control the trade-offs between target coverage and organ-at-risk (OAR) sparing. The agent's state representation includes both dose-volume histogram (DVH) metrics and current TPP values, while its reward function incorporates clinical dose objectives and safety constraints, including D90, V150, V200 for targets, and D2cc for all relevant OARs (bladder, rectum, sigmoid, small bowel, and large bowel). In the second stage, a customized Adam-based optimizer computes the corresponding dwell time distribution for the selected TPPs using a clinically informed loss function. The framework was evaluated on a cohort of patients with complex applicator geometries. The proposed framework successfully learned clinically meaningful TPP adjustments across diverse patient anatomies. For the unseen test patients, the RL-based automated planning method achieved an average score of 93.89%, outperforming the clinical plans which averaged 91.86%. These findings are notable given that score improvements were achieved while maintaining full target coverage and reducing CTV hot spots in most cases.