MIMIC-SR-ICD11: A Dataset for Narrative-Based Diagnosis

πŸ“… 2025-11-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

176K/year
πŸ€– AI Summary
This study addresses the problem that clinically subtle signals in patient-reported narratives within electronic health records (EHRs) are frequently overlooked, leading to diagnostic bias. To tackle this, we introduce MIMIC-SR-ICD11β€”the first large-scale, natively ICD-11-aligned English discharge summary diagnosis dataset. We propose LL-Rank, a novel re-ranking framework that innovatively incorporates a Pointwise Mutual Information (PMI)-driven semantic matching mechanism to decouple label semantic relatedness from frequency-induced bias. LL-Rank further integrates length-normalized joint probability scoring with report-agnostic prior calibration for accurate multi-label diagnosis modeling. Evaluated across seven large language model backbones, LL-Rank consistently outperforms the strong generative baseline GenMap. Ablation studies confirm that performance gains primarily stem from PMI-guided semantic alignment, validating the efficacy of our design in capturing nuanced clinical semantics from unstructured narrative text.

Technology Category

Application Category

πŸ“ Abstract
Disease diagnosis is a central pillar of modern healthcare, enabling early detection and timely intervention for acute conditions while guiding lifestyle adjustments and medication regimens to prevent or slow chronic disease. Self-reports preserve clinically salient signals that templated electronic health record (EHR) documentation often attenuates or omits, especially subtle but consequential details. To operationalize this shift, we introduce MIMIC-SR-ICD11, a large English diagnostic dataset built from EHR discharge notes and natively aligned to WHO ICD-11 terminology. We further present LL-Rank, a likelihood-based re-ranking framework that computes a length-normalized joint likelihood of each label given the clinical report context and subtracts the corresponding report-free prior likelihood for that label. Across seven model backbones, LL-Rank consistently outperforms a strong generation-plus-mapping baseline (GenMap). Ablation experiments show that LL-Rank's gains primarily stem from its PMI-based scoring, which isolates semantic compatibility from label frequency bias.
Problem

Research questions and friction points this paper is trying to address.

Automating disease diagnosis from patient self-reports
Mapping clinical narratives to WHO ICD-11 terminology
Reducing label frequency bias in diagnostic classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset built from EHR discharge notes
Likelihood-based re-ranking framework for labels
PMI-based scoring reduces label frequency bias
πŸ”Ž Similar Papers
No similar papers found.