🤖 AI Summary
Existing knee MRI benchmarks are predominantly unimodal and feature coarse labels, limiting their utility for comprehensive meniscal tear assessment. This work addresses these limitations by constructing a structured multimodal benchmark comprising 746 multicenter cases, integrating triplanar MRI sequences, clinical priors (e.g., age, sex, BMI), and expert diagnostic narratives. The benchmark supports two key tasks: Stoller grading and automated radiology report generation. For the first time, clinical priors and structured textual annotations are incorporated into meniscus evaluation, accompanied by a novel risk-aware ordinal assessment framework and a semantic consistency metric termed Meni-Score. Experimental results demonstrate that multimodal fusion significantly improves grading accuracy and markedly reduces severe misclassifications, thereby validating its clinical relevance for safe and reliable diagnosis.
📝 Abstract
Clinical diagnosis of meniscus injuries requires radiologists to integrate volumetric MRI evidence with patient context (e.g., sex, age, BMI) and to produce structured diagnostic reports. Existing knee MRI benchmarks are typically unimodal and rely on coarse labels, limiting their ability to evaluate holistic clinical reasoning. We introduce MeniOmni, a structured multimodal benchmark for meniscus injury assessment, consisting of 746 multi-center MRI studies with tri-planar volumetric inputs, Clinical Priors, and expert-annotated clinical text. MeniOmni supports two tasks: (1) fine-grained Stoller severity grading and (2) diagnostic report generation. We further propose risk-aware ordinal evaluation and a semantic consistency metric (Meni-Score) to better reflect clinical relevance. Baseline experiments show that incorporating Clinical Priors improves grading performance and reduces severe errors, highlighting the value of multimodal context for safer assessment. Code and data are available at https://github.com/ShuruiXu/MeniOmni.