MEPNet: Medical Entity-balanced Prompting Network for Brain CT Report Generation

📅 2025-03-22

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

In automated brain CT report generation, models exhibit learning bias toward heterogeneous medical entities due to imbalanced spatial distributions of anatomical regions and pathological findings, resulting in repetition, missed diagnoses, and inaccurate descriptions. To address this, we propose the Medical Entity Equilibrium Prompting (MEEP) framework—a novel dual-path architecture integrating knowledge-driven joint attention with entity-specific learning-state scoring. We further design a vision–state enhanced prompting mechanism to guide large language models (LLMs) toward fair, adaptive understanding of medical entities. Through end-to-end fine-tuning and inference optimization, MEEP achieves significant improvements on two mainstream brain CT report generation benchmarks: +12.3% in clinical accuracy, +9.7% in textual coherence, and enhanced lesion coverage—effectively mitigating both redundant generation and critical sign omission.

Technology Category

Application Category

📝 Abstract

The automatic generation of brain CT reports has gained widespread attention, given its potential to assist radiologists in diagnosing cranial diseases. However, brain CT scans involve extensive medical entities, such as diverse anatomy regions and lesions, exhibiting highly inconsistent spatial patterns in 3D volumetric space. This leads to biased learning of medical entities in existing methods, resulting in repetitiveness and inaccuracy in generated reports. To this end, we propose a Medical Entity-balanced Prompting Network (MEPNet), which harnesses the large language model (LLM) to fairly interpret various entities for accurate brain CT report generation. By introducing the visual embedding and the learning status of medical entities as enriched clues, our method prompts the LLM to balance the learning of diverse entities, thereby enhancing reports with comprehensive findings. First, to extract visual embedding of entities, we propose Knowledge-driven Joint Attention to explore and distill entity patterns using both explicit and implicit medical knowledge. Then, a Learning Status Scorer is designed to evaluate the learning of entity visual embeddings, resulting in unique learning status for individual entities. Finally, these entity visual embeddings and status are elaborately integrated into multi-modal prompts, to guide the text generation of LLM. This process allows LLM to self-adapt the learning process for biased-fitted entities, thereby covering detailed findings in generated reports. We conduct experiments on two brain CT report generation benchmarks, showing the effectiveness in clinical accuracy and text coherence.

Problem

Research questions and friction points this paper is trying to address.

Biased learning of diverse medical entities in brain CT scans

Repetitiveness and inaccuracy in generated CT reports

Unbalanced interpretation of 3D spatial patterns in volumetric data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge-driven Joint Attention for entity patterns

Learning Status Scorer for entity evaluation

Multi-modal prompts to guide LLM generation

🔎 Similar Papers

No similar papers found.