🤖 AI Summary
In HDR prostate brachytherapy, needle placement planning heavily relies on physician expertise, resulting in low efficiency and inconsistent plan quality. Method: This study introduces deep reinforcement learning (specifically the Proximal Policy Optimization algorithm) for fully automated treatment planning—extracting anatomical features from preoperative imaging, employing a multi-round, needle-by-needle optimization strategy, and incorporating a clinically informed dosimetric reward function. Results: The proposed method achieves equivalent performance to manual plans for prostate V100 and rectal D2cc (p > 0.05), while significantly outperforming them for prostate V150 and urethral D20% (p < 0.05). It reduces the average number of needles by two, substantially decreases inter-physician variability, and improves both planning consistency and efficiency.
📝 Abstract
Purpose: In high-dose-rate (HDR) prostate brachytherapy procedures, the pattern of needle placement solely relies on physician experience. We investigated the feasibility of using reinforcement learning (RL) to provide needle positions and dwell times based on patient anatomy during pre-planning stage. This approach would reduce procedure time and ensure consistent plan quality. Materials and Methods: We train a RL agent to adjust the position of one selected needle and all the dwell times on it to maximize a pre-defined reward function after observing the environment. After adjusting, the RL agent then moves on to the next needle, until all needles are adjusted. Multiple rounds are played by the agent until the maximum number of rounds is reached. Plan data from 11 prostate HDR boost patients (1 for training, and 10 for testing) treated in our clinic were included in this study. The dosimetric metrics and the number of used needles of RL plan were compared to those of the clinical results (ground truth). Results: On average, RL plans and clinical plans have very similar prostate coverage (Prostate V100) and Rectum D2cc (no statistical significance), while RL plans have less prostate hotspot (Prostate V150) and Urethra D20% plans with statistical significance. Moreover, RL plans use 2 less needles than clinical plan on average. Conclusion: We present the first study demonstrating the feasibility of using reinforcement learning to autonomously generate clinically practical HDR prostate brachytherapy plans. This RL-based method achieved equal or improved plan quality compared to conventional clinical approaches while requiring fewer needles. With minimal data requirements and strong generalizability, this approach has substantial potential to standardize brachytherapy planning, reduce clinical variability, and enhance patient outcomes.