Any2Any 3D Diffusion Models with Knowledge Transfer: A Radiotherapy Planning Study

📅 2026-05-10

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work addresses the limited generalization of voxel-level dose prediction in radiotherapy planning across diverse clinical scenarios by proposing DiffKT3D, a novel Any2Any 3D diffusion framework. Built upon a pretrained video diffusion model, DiffKT3D enables flexible multimodal conditional inputs and introduces an Any2Any conditioning paradigm to circumvent the computational overhead of cross-attention mechanisms. Furthermore, a reinforcement learning-based post-training strategy, guided by clinical scoring cards, is incorporated to align the model with institutional treatment preferences. Experimental results demonstrate that DiffKT3D achieves a voxel-level mean absolute error of 1.93, outperforming the GDP-HMM challenge winner, and exhibits superior performance in both dose distribution quality and adherence to clinical preferences.

📝 Abstract

Voxel-wise dose prediction is a critical yet challenging task in practical radiotherapy (RT) planning, as bespoke models trained from scratch often struggle to generalize across diverse clinical settings. Meanwhile, generative models trained on billion-scale datasets from vision domains have achieved impressive performance. Herein, we propose DiffKT3D, a unified Any2Any 3D diffusion framework that leverages prior knowledge from pretrained video diffusion models for efficient and clinically meaningful dose prediction. To enable flexible conditioning across multiple clinical modalities (CT, anatomical structures, body, beam settings, etc.), we introduce an Any2Any conditional paradigm utilizing modality-specific embeddings without cross-attention overhead. Further, we design a novel reinforcement learning (RL) post-training mechanism guided by a clinically-informed Scorecard explicitly tailored to institutional treatment preferences. Compared with winner of GDP-HMM challenge, DiffKT3D sets a new state-of-the-art in dose prediction by reducing voxel-level MAE from 2.07 to 1.93. In addition, DiffKT3D achieves superior image quality and preference match. These results demonstrate that transferring diffusion priors via modality-aware conditioning and clinically aligned RL post-training can provide a robust and generalizable solution for RT planning across various clinical scenarios.

Problem

Research questions and friction points this paper is trying to address.

dose prediction

radiotherapy planning

generalization

voxel-wise

clinical settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Any2Any diffusion

knowledge transfer

modality-specific conditioning