Patient-specific vs Multi-Patient Vision Transformer for Markerless Tumor Motion Forecasting

📅 2025-07-10

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This study addresses the insufficient accuracy of markerless respiratory motion prediction for lung tumors in proton therapy. We propose, for the first time, a vision transformer (ViT)-based framework that predicts 1-second tumor trajectories from 16-frame digital reconstructed radiographs (DRRs). We systematically compare patient-specific models against multi-patient generalizable models under realistic constraints—including limited planning-phase data, anatomical variability, and clinical temporal requirements. Patient-specific models achieve higher accuracy on planning-phase data, whereas generalizable models demonstrate superior robustness on treatment-phase data without retraining, better meeting real-time clinical demands. The key contributions are: (i) the pioneering adoption of ViT architecture for markerless motion prediction in proton therapy; and (ii) empirical characterization of the inherent trade-off between personalization and generalizability in clinical practice. This work establishes a deployable, clinically viable paradigm for markerless motion prediction.

Technology Category

Application Category

📝 Abstract

Background: Accurate forecasting of lung tumor motion is essential for precise dose delivery in proton therapy. While current markerless methods mostly rely on deep learning, transformer-based architectures remain unexplored in this domain, despite their proven performance in trajectory forecasting. Purpose: This work introduces a markerless forecasting approach for lung tumor motion using Vision Transformers (ViT). Two training strategies are evaluated under clinically realistic constraints: a patient-specific (PS) approach that learns individualized motion patterns, and a multi-patient (MP) model designed for generalization. The comparison explicitly accounts for the limited number of images that can be generated between planning and treatment sessions. Methods: Digitally reconstructed radiographs (DRRs) derived from planning 4DCT scans of 31 patients were used to train the MP model; a 32nd patient was held out for evaluation. PS models were trained using only the target patient's planning data. Both models used 16 DRRs per input and predicted tumor motion over a 1-second horizon. Performance was assessed using Average Displacement Error (ADE) and Final Displacement Error (FDE), on both planning (T1) and treatment (T2) data. Results: On T1 data, PS models outperformed MP models across all training set sizes, especially with larger datasets (up to 25,000 DRRs, p < 0.05). However, MP models demonstrated stronger robustness to inter-fractional anatomical variability and achieved comparable performance on T2 data without retraining. Conclusions: This is the first study to apply ViT architectures to markerless tumor motion forecasting. While PS models achieve higher precision, MP models offer robust out-of-the-box performance, well-suited for time-constrained clinical settings.

Problem

Research questions and friction points this paper is trying to address.

Forecasting lung tumor motion for precise proton therapy.

Comparing patient-specific vs multi-patient Vision Transformer models.

Evaluating robustness to anatomical variability in clinical settings.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision Transformers for tumor motion forecasting

Patient-specific and multi-patient training strategies

Markerless approach using DRRs from 4DCT scans

🔎 Similar Papers

No similar papers found.