Vision-Language Models for Automated 3D PET/CT Report Generation

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF

career value

169K/year
🤖 AI Summary
Addressing challenges in PET/CT report generation—including difficulty in interpreting functional metabolic patterns, poor cross-center generalizability, and low clinical consistency—this paper proposes PETRG-3D, an end-to-end 3D dual-branch framework. It jointly processes full-volume PET and CT data and incorporates a style-adaptive prompting mechanism to model tracer-specific physiological variability and institution-dependent reporting styles. We introduce two key contributions: (1) PETRG-Lym, a novel multi-center lymphoma dataset, and its publicly released benchmark AutoPET-RG-Lym; and (2) PETRG-Score, the first evaluation protocol jointly quantifying clinical accuracy for both metabolic abnormalities and structural lesions. The method leverages 3D vision-language modeling, dual-modality encoding, and clinically validated annotations. Experiments demonstrate significant improvements: +31.49% in ROUGE-L and +8.18% in the composite PET-All clinical metric, substantially outperforming state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract
Positron emission tomography/computed tomography (PET/CT) is essential in oncology, yet the rapid expansion of scanners has outpaced the availability of trained specialists, making automated PET/CT report generation (PETRG) increasingly important for reducing clinical workload. Compared with structural imaging (e.g., X-ray, CT, and MRI), functional PET poses distinct challenges: metabolic patterns vary with tracer physiology, and whole-body 3D contextual information is required rather than local-region interpretation. To advance PETRG, we propose PETRG-3D, an end-to-end 3D dual-branch framework that separately encodes PET and CT volumes and incorporates style-adaptive prompts to mitigate inter-hospital variability in reporting practices. We construct PETRG-Lym, a multi-center lymphoma dataset collected from four hospitals (824 reports w/ 245,509 paired PET/CT slices), and construct AutoPET-RG-Lym, a publicly accessible PETRG benchmark derived from open imaging data but equipped with new expert-written, clinically validated reports (135 cases). To assess clinical utility, we introduce PETRG-Score, a lymphoma-specific evaluation protocol that jointly measures metabolic and structural findings across curated anatomical regions. Experiments show that PETRG-3D substantially outperforms existing methods on both natural language metrics (e.g., +31.49% ROUGE-L) and clinical efficacy metrics (e.g., +8.18% PET-All), highlighting the benefits of volumetric dual-modality modeling and style-aware prompting. Overall, this work establishes a foundation for future PET/CT-specific models emphasizing disease-aware reasoning and clinically reliable evaluation. Codes, models, and AutoPET-RG-Lym will be released.
Problem

Research questions and friction points this paper is trying to address.

Automating PET/CT report generation to address specialist shortages in oncology
Handling 3D functional PET metabolic patterns requiring whole-body contextual interpretation
Mitigating inter-hospital variability in reporting practices across medical institutions
Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end 3D dual-branch framework for PET/CT volumes
Style-adaptive prompts to reduce inter-hospital variability
Volumetric dual-modality modeling with disease-aware reasoning
W
Wenpei Jiao
Institute of Medical Technology and National Biomedical Imaging Center, Peking University
Kun Shang
Kun Shang
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
OptimizationSNNAI
H
Hui Li
Peking University Third Hospital
K
Ke Yan
DAMO Academy, Alibaba Group
Jiajin Zhang
Jiajin Zhang
Alibaba, DAMO Academy
Medical Image AnalysisComputer VisionMedical Imaging
G
Guangjie Yang
The Affiliated Hospital of Qingdao University
L
Lijuan Guo
The First Affiliated Hospital of Henan Medical University
Yan Wan
Yan Wan
University of Texas at Arlington
Large-Scale Dynamical Systems and Control
Xing Yang
Xing Yang
Peking University People’s Hospital
Dakai Jin
Dakai Jin
Alibaba DAMO Academy USA
Deep LearningMedical Image AnalysisAI for Healthcare
Z
Zhaoheng Xie
Institute of Medical Technology and National Biomedical Imaging Center, Peking University