Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and Challenges

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the lack of standardized, dynamic, interactive evaluation frameworks for large language models (LLMs) in intelligent outpatient referral (IOR) tasks. To this end, we propose the first dual-modal evaluation framework integrating static recommendation and dynamic dialogue optimization. Methodologically, we construct a structured benchmark grounded in multi-scale prompt engineering, dialogue trajectory modeling, and human calibration, covering major open- and closed-source LLMs (e.g., Llama, GPT series) alongside BERT-based baselines. Experimental results show that LLMs significantly outperform fine-tuned BERT in dynamic follow-up question generation quality, yet yield only marginal gains in static referral accuracy. Our key contributions are: (1) formalizing the core evaluation paradigm for IOR; (2) releasing the first structured benchmark and evaluation protocol specifically designed for outpatient referral; and (3) empirically characterizing the capability boundaries and applicable scenarios of LLMs in interactive clinical consultation.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) are increasingly applied to outpatient referral tasks across healthcare systems. However, there is a lack of standardized evaluation criteria to assess their effectiveness, particularly in dynamic, interactive scenarios. In this study, we systematically examine the capabilities and limitations of LLMs in managing tasks within Intelligent Outpatient Referral (IOR) systems and propose a comprehensive evaluation framework specifically designed for such systems. This framework comprises two core tasks: static evaluation, which focuses on evaluating the ability of predefined outpatient referrals, and dynamic evaluation, which evaluates capabilities of refining outpatient referral recommendations through iterative dialogues. Our findings suggest that LLMs offer limited advantages over BERT-like models, but show promise in asking effective questions during interactive dialogues.
Problem

Research questions and friction points this paper is trying to address.

Lack of standardized evaluation criteria for LLMs in outpatient referral tasks.
Need for assessing LLMs in dynamic, interactive healthcare scenarios.
Proposing a framework to evaluate LLMs in Intelligent Outpatient Referral systems.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes comprehensive LLM evaluation framework.
Includes static and dynamic evaluation tasks.
Highlights LLMs' potential in interactive dialogues.
🔎 Similar Papers
No similar papers found.
Xiaoxiao Liu
Xiaoxiao Liu
Principal Staff Engineer, AMD
Emerging MemoryHeterogeneous SystemNeuromorphic ComputingComputer Architecture
Q
Qingying Xiao
National Health Data Institute, Shenzhen, Chinese University of Hong Kong, Shenzhen
J
Junying Chen
Chinese University of Hong Kong, Shenzhen
X
Xiangyi Feng
Chinese University of Hong Kong, Shenzhen
X
Xiangbo Wu
Shenzhen Research Institute of Big Data
B
Bairui Zhang
Chinese University of Hong Kong, Shenzhen
J
Jian Chang
Bournemouth University
G
Guangjun Yu
National Health Data Institute, Shenzhen
Y
Yan Hu
Chinese University of Hong Kong, Shenzhen
Benyou Wang
Benyou Wang
Assistant Professor, The Chinese University of Hong Kong, Shenzhen
large language modelsnatural language processinginformation retrievalapplied machine learning