Vision-Language Models for Automated 3D PET/CT Report Generation

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Addressing challenges in PET/CT report generation—including difficulty in interpreting functional metabolic patterns, poor cross-center generalizability, and low clinical consistency—this paper proposes PETRG-3D, an end-to-end 3D dual-branch framework. It jointly processes full-volume PET and CT data and incorporates a style-adaptive prompting mechanism to model tracer-specific physiological variability and institution-dependent reporting styles. We introduce two key contributions: (1) PETRG-Lym, a novel multi-center lymphoma dataset, and its publicly released benchmark AutoPET-RG-Lym; and (2) PETRG-Score, the first evaluation protocol jointly quantifying clinical accuracy for both metabolic abnormalities and structural lesions. The method leverages 3D vision-language modeling, dual-modality encoding, and clinically validated annotations. Experiments demonstrate significant improvements: +31.49% in ROUGE-L and +8.18% in the composite PET-All clinical metric, substantially outperforming state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract

Positron emission tomography/computed tomography (PET/CT) is essential in oncology, yet the rapid expansion of scanners has outpaced the availability of trained specialists, making automated PET/CT report generation (PETRG) increasingly important for reducing clinical workload. Compared with structural imaging (e.g., X-ray, CT, and MRI), functional PET poses distinct challenges: metabolic patterns vary with tracer physiology, and whole-body 3D contextual information is required rather than local-region interpretation. To advance PETRG, we propose PETRG-3D, an end-to-end 3D dual-branch framework that separately encodes PET and CT volumes and incorporates style-adaptive prompts to mitigate inter-hospital variability in reporting practices. We construct PETRG-Lym, a multi-center lymphoma dataset collected from four hospitals (824 reports w/ 245,509 paired PET/CT slices), and construct AutoPET-RG-Lym, a publicly accessible PETRG benchmark derived from open imaging data but equipped with new expert-written, clinically validated reports (135 cases). To assess clinical utility, we introduce PETRG-Score, a lymphoma-specific evaluation protocol that jointly measures metabolic and structural findings across curated anatomical regions. Experiments show that PETRG-3D substantially outperforms existing methods on both natural language metrics (e.g., +31.49% ROUGE-L) and clinical efficacy metrics (e.g., +8.18% PET-All), highlighting the benefits of volumetric dual-modality modeling and style-aware prompting. Overall, this work establishes a foundation for future PET/CT-specific models emphasizing disease-aware reasoning and clinically reliable evaluation. Codes, models, and AutoPET-RG-Lym will be released.

Problem

Research questions and friction points this paper is trying to address.

Automating PET/CT report generation to address specialist shortages in oncology

Handling 3D functional PET metabolic patterns requiring whole-body contextual interpretation

Mitigating inter-hospital variability in reporting practices across medical institutions

Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end 3D dual-branch framework for PET/CT volumes

Style-adaptive prompts to reduce inter-hospital variability

Volumetric dual-modality modeling with disease-aware reasoning

🔎 Similar Papers

Argus: Benchmarking and Enhancing Vision-Language Models for 3D Radiology Report Generation