Medical Triage as Pairwise Ranking: A Benchmark for Urgency in Patient Portal Messages

📅 2026-01-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of automating resource allocation and prioritization in asynchronous outpatient patient portal messaging by framing triage as a pairwise comparison task. Leveraging large language models to assess message urgency, the approach simulates physician inbox reordering. The study introduces PMR-Bench, the first large-scale public triage benchmark, and proposes a novel pairwise ranking paradigm alongside a scalable strategy for training data generation and domain-adaptive annotation. By integrating Bradley-Terry preference learning with supervised fine-tuning (SFT) on real-world electronic health records and unstructured patient messages, the authors develop UrgentReward and UrgentSFT models. Experimental results demonstrate that UrgentSFT-8B and UrgentReward-8B outperform off-the-shelf 8B models by 15 and 16 percentage points, respectively, on inbox prioritization metrics, confirming the efficacy of the proposed methodology.

Technology Category

Application Category

📝 Abstract
Medical triage is the task of allocating medical resources and prioritizing patients based on medical need. This paper introduces the first large-scale public dataset for studying medical triage in the context of asynchronous outpatient portal messages. Our novel task formulation views patient message triage as a pairwise inference problem, where we train LLMs to choose `"which message is more medically urgent"in a head-to-head tournament-style re-sort of a physician's inbox. Our novel benchmark PMR-Bench contains 1569 unique messages and 2,000+ high-quality test pairs for pairwise medical urgency assessment alongside a scalable training data generation pipeline. PMR-Bench includes samples that contain both unstructured patient-written messages alongside real electronic health record (EHR) data, emulating a real-world medical triage scenario. We develop a novel automated data annotation strategy to provide LLMs with in-domain guidance on this task. The resulting data is used to train two model classes, UrgentReward and UrgentSFT, leveraging Bradley-Terry and next token prediction objective, respectively to perform pairwise urgency classification. We find that UrgentSFT achieves top performance on PMR-Bench, with UrgentReward showing distinct advantages in low-resource settings. For example, UrgentSFT-8B and UrgentReward-8B provide a 15- and 16-point boost, respectively, on inbox sorting metrics over off-the-shelf 8B models. Paper resources can be found at https://tinyurl.com/Patient-Message-Triage
Problem

Research questions and friction points this paper is trying to address.

medical triage
patient portal messages
pairwise ranking
urgency assessment
outpatient care
Innovation

Methods, ideas, or system contributions that make the work stand out.

pairwise ranking
medical triage
large language models
PMR-Bench
automated annotation