Vision-Language Models for Automated 3D PET/CT Report Generation

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing challenges in PET/CT report generation—including difficulty in interpreting functional metabolic patterns, poor cross-center generalizability, and low clinical consistency—this paper proposes PETRG-3D, an end-to-end 3D dual-branch framework. It jointly processes full-volume PET and CT data and incorporates a style-adaptive prompting mechanism to model tracer-specific physiological variability and institution-dependent reporting styles. We introduce two key contributions: (1) PETRG-Lym, a novel multi-center lymphoma dataset, and its publicly released benchmark AutoPET-RG-Lym; and (2) PETRG-Score, the first evaluation protocol jointly quantifying clinical accuracy for both metabolic abnormalities and structural lesions. The method leverages 3D vision-language modeling, dual-modality encoding, and clinically validated annotations. Experiments demonstrate significant improvements: +31.49% in ROUGE-L and +8.18% in the composite PET-All clinical metric, substantially outperforming state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract
Positron emission tomography/computed tomography (PET/CT) is essential in oncology, yet the rapid expansion of scanners has outpaced the availability of trained specialists, making automated PET/CT report generation (PETRG) increasingly important for reducing clinical workload. Compared with structural imaging (e.g., X-ray, CT, and MRI), functional PET poses distinct challenges: metabolic patterns vary with tracer physiology, and whole-body 3D contextual information is required rather than local-region interpretation. To advance PETRG, we propose PETRG-3D, an end-to-end 3D dual-branch framework that separately encodes PET and CT volumes and incorporates style-adaptive prompts to mitigate inter-hospital variability in reporting practices. We construct PETRG-Lym, a multi-center lymphoma dataset collected from four hospitals (824 reports w/ 245,509 paired PET/CT slices), and construct AutoPET-RG-Lym, a publicly accessible PETRG benchmark derived from open imaging data but equipped with new expert-written, clinically validated reports (135 cases). To assess clinical utility, we introduce PETRG-Score, a lymphoma-specific evaluation protocol that jointly measures metabolic and structural findings across curated anatomical regions. Experiments show that PETRG-3D substantially outperforms existing methods on both natural language metrics (e.g., +31.49% ROUGE-L) and clinical efficacy metrics (e.g., +8.18% PET-All), highlighting the benefits of volumetric dual-modality modeling and style-aware prompting. Overall, this work establishes a foundation for future PET/CT-specific models emphasizing disease-aware reasoning and clinically reliable evaluation. Codes, models, and AutoPET-RG-Lym will be released.
Problem

Research questions and friction points this paper is trying to address.

Automating PET/CT report generation to address specialist shortages in oncology
Handling 3D functional PET metabolic patterns requiring whole-body contextual interpretation
Mitigating inter-hospital variability in reporting practices across medical institutions
Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end 3D dual-branch framework for PET/CT volumes
Style-adaptive prompts to reduce inter-hospital variability
Volumetric dual-modality modeling with disease-aware reasoning
🔎 Similar Papers
No similar papers found.
W
Wenpei Jiao
Institute of Medical Technology and National Biomedical Imaging Center, Peking University
Kun Shang
Kun Shang
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
OptimizationSNNAI
H
Hui Li
Peking University Third Hospital
K
Ke Yan
DAMO Academy, Alibaba Group
Jiajin Zhang
Jiajin Zhang
Alibaba, DAMO Academy
Medical Image AnalysisComputer VisionMedical Imaging
G
Guangjie Yang
The Affiliated Hospital of Qingdao University
L
Lijuan Guo
The First Affiliated Hospital of Henan Medical University
Yan Wan
Yan Wan
Jiujiang City Key Laboratory of Cell Therapy, Jiu Jiang NO.1 People’s Hospital
Xing Yang
Xing Yang
Peking University People’s Hospital
Dakai Jin
Dakai Jin
Alibaba DAMO Academy USA
Deep LearningMedical Image AnalysisAI for Healthcare
Z
Zhaoheng Xie
Institute of Medical Technology and National Biomedical Imaging Center, Peking University