LDP: Parameter-Efficient Fine-Tuning of Multimodal LLM for Medical Report Generation

📅 2025-12-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Automated colonoscopy polyp reporting suffers from inconsistency and hallucination due to scarcity of high-quality multimodal medical data. Method: We propose the first multimodal report generation framework integrating LoRA-based efficient fine-tuning with clinical preference alignment via Direct Preference Optimization (DPO), built upon Qwen2-VL-7B. We introduce a novel medical image–text alignment mechanism and release MMEndo—the first expert-annotated endoscopic image-text dataset. Contribution/Results: Our model achieves superior performance over all baselines in both automated metrics and clinical expert evaluation (7.2/10). Training costs are reduced by 833× compared to full-parameter fine-tuning. Cross-dataset validation on IU-XRay demonstrates strong generalization and robustness. This work significantly enhances clinical trustworthiness and deployment feasibility of automated colonoscopy reporting.

Technology Category

Application Category

📝 Abstract
Colonoscopic polyp diagnosis is pivotal for early colorectal cancer detection, yet traditional automated reporting suffers from inconsistencies and hallucinations due to the scarcity of high-quality multimodal medical data. To bridge this gap, we propose LDP, a novel framework leveraging multimodal large language models (MLLMs) for professional polyp diagnosis report generation. Specifically, we curate MMEndo, a multimodal endoscopic dataset comprising expert-annotated colonoscopy image-text pairs. We fine-tune the Qwen2-VL-7B backbone using Parameter-Efficient Fine-Tuning (LoRA) and align it with clinical standards via Direct Preference Optimization (DPO). Extensive experiments show that our LDP outperforms existing baselines on both automated metrics and rigorous clinical expert evaluations (achieving a Physician Score of 7.2/10), significantly reducing training computational costs by 833x compared to full fine-tuning. The proposed solution offers a scalable, clinically viable path for primary healthcare, with additional validation on the IU-XRay dataset confirming its robustness.
Problem

Research questions and friction points this paper is trying to address.

Generates professional polyp diagnosis reports from colonoscopy images
Reduces hallucinations and inconsistencies in automated medical reporting
Lowers computational costs for fine-tuning multimodal language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameter-efficient fine-tuning with LoRA
Direct Preference Optimization for clinical alignment
Multimodal dataset curation for endoscopic images
🔎 Similar Papers
No similar papers found.