🤖 AI Summary
Intensity-modulated radiation therapy (IMRT) inverse planning for prostate cancer suffers from heavy reliance on large-scale annotated datasets, poor generalizability, and insufficient robustness. Method: This paper proposes a deep reinforcement learning framework based on Actor-Critic with Experience Replay (ACER), the first to integrate experience replay into automated radiotherapy treatment planning. It enables end-to-end parameter optimization and plan generation using only a single patient case, eliminating the need for extensive labeled data. Clinical constraints are embedded via dose-volume histogram (DVH) modeling, and adversarial robustness is validated using Fast Gradient Sign Method (FGSM) attacks. Results: Evaluated on over 300 test cases, the method achieves ProKnow scores of 9 (full marks) for 93.09% of plans, with a mean score of 8.93 ± 0.27—significantly outperforming the baseline (6.20 ± 1.84). It demonstrates strong few-shot generalization and robustness against adversarial perturbations.
📝 Abstract
Background: Real-time treatment planning in IMRT is challenging due to complex beam interactions. AI has improved automation, but existing models require large, high-quality datasets and lack universal applicability. Deep reinforcement learning (DRL) offers a promising alternative by mimicking human trial-and-error planning. Purpose: Develop a stochastic policy-based DRL agent for automatic treatment planning with efficient training, broad applicability, and robustness against adversarial attacks using Fast Gradient Sign Method (FGSM). Methods: Using the Actor-Critic with Experience Replay (ACER) architecture, the agent tunes treatment planning parameters (TPPs) in inverse planning. Training is based on prostate cancer IMRT cases, using dose-volume histograms (DVHs) as input. The model is trained on a single patient case, validated on two independent cases, and tested on 300+ plans across three datasets. Plan quality is assessed using ProKnow scores, and robustness is tested against adversarial attacks. Results: Despite training on a single case, the model generalizes well. Before ACER-based planning, the mean plan score was 6.20$pm$1.84; after, 93.09% of cases achieved a perfect score of 9, with a mean of 8.93$pm$0.27. The agent effectively prioritizes optimal TPP tuning and remains robust against adversarial attacks. Conclusions: The ACER-based DRL agent enables efficient, high-quality treatment planning in prostate cancer IMRT, demonstrating strong generalizability and robustness.