OrthoInsight: Rib Fracture Diagnosis and Report Generation Based on Multi-Modal Large Models

📅 2025-07-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the time-consuming, error-prone manual diagnosis and inefficient report generation for rib fractures in CT imaging, this paper proposes an end-to-end multimodal diagnostic framework. Methodologically, it integrates YOLOv9 for precise rib fracture localization, incorporates a medical knowledge graph to enhance clinical reasoning, and fine-tunes LLaVA to build an image–text joint generation model that automatically produces structured diagnostic reports from CT scans. Its key innovation lies in the first deep coupling of object detection, domain-specific knowledge retrieval, and multimodal large language model generation—significantly improving clinical applicability. Evaluated on 28,675 annotated CT cases, the framework achieves an average score of 4.28/5.0 across four clinical metrics: diagnostic accuracy, report completeness, logical coherence, and clinical guidance value—outperforming general-purpose large models such as GPT-4.

Technology Category

Application Category

📝 Abstract
The growing volume of medical imaging data has increased the need for automated diagnostic tools, especially for musculoskeletal injuries like rib fractures, commonly detected via CT scans. Manual interpretation is time-consuming and error-prone. We propose OrthoInsight, a multi-modal deep learning framework for rib fracture diagnosis and report generation. It integrates a YOLOv9 model for fracture detection, a medical knowledge graph for retrieving clinical context, and a fine-tuned LLaVA language model for generating diagnostic reports. OrthoInsight combines visual features from CT images with expert textual data to deliver clinically useful outputs. Evaluated on 28,675 annotated CT images and expert reports, it achieves high performance across Diagnostic Accuracy, Content Completeness, Logical Coherence, and Clinical Guidance Value, with an average score of 4.28, outperforming models like GPT-4 and Claude-3. This study demonstrates the potential of multi-modal learning in transforming medical image analysis and providing effective support for radiologists.
Problem

Research questions and friction points this paper is trying to address.

Automates rib fracture diagnosis from CT scans to reduce errors
Generates clinical reports by combining visual and textual data
Improves diagnostic accuracy and efficiency for musculoskeletal injuries
Innovation

Methods, ideas, or system contributions that make the work stand out.

YOLOv9 model for fracture detection
Medical knowledge graph for clinical context
Fine-tuned LLaVA model for report generation
🔎 Similar Papers
No similar papers found.
N
Ningyong Wu
West China Longquan Hospital, Sichuan University, chengdu, sichuan, China
J
Jinzhi Wang
Systems Engineering Institute, Xi’an Jiaotong University, Xian, Shanxi, China
W
Wenhong Zhao
Organizational Management Department, School of Management, Xi’an Jiaotong University, Xian, Shanxi, China
C
Chenzhan Yu
West China Longquan Hospital, Sichuan University, chengdu, sichuan, China
Z
Zhigang Xiu
West China Longquan Hospital, Sichuan University, chengdu, sichuan, China
Duwei Dai
Duwei Dai
Xi'an Jiaotong University Second Affiliated Hospital
artificial intelligence