Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and Challenges

📅 2025-03-11

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This paper addresses the lack of standardized, dynamic, interactive evaluation frameworks for large language models (LLMs) in intelligent outpatient referral (IOR) tasks. To this end, we propose the first dual-modal evaluation framework integrating static recommendation and dynamic dialogue optimization. Methodologically, we construct a structured benchmark grounded in multi-scale prompt engineering, dialogue trajectory modeling, and human calibration, covering major open- and closed-source LLMs (e.g., Llama, GPT series) alongside BERT-based baselines. Experimental results show that LLMs significantly outperform fine-tuned BERT in dynamic follow-up question generation quality, yet yield only marginal gains in static referral accuracy. Our key contributions are: (1) formalizing the core evaluation paradigm for IOR; (2) releasing the first structured benchmark and evaluation protocol specifically designed for outpatient referral; and (3) empirically characterizing the capability boundaries and applicable scenarios of LLMs in interactive clinical consultation.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are increasingly applied to outpatient referral tasks across healthcare systems. However, there is a lack of standardized evaluation criteria to assess their effectiveness, particularly in dynamic, interactive scenarios. In this study, we systematically examine the capabilities and limitations of LLMs in managing tasks within Intelligent Outpatient Referral (IOR) systems and propose a comprehensive evaluation framework specifically designed for such systems. This framework comprises two core tasks: static evaluation, which focuses on evaluating the ability of predefined outpatient referrals, and dynamic evaluation, which evaluates capabilities of refining outpatient referral recommendations through iterative dialogues. Our findings suggest that LLMs offer limited advantages over BERT-like models, but show promise in asking effective questions during interactive dialogues.

Problem

Research questions and friction points this paper is trying to address.

Lack of standardized evaluation criteria for LLMs in outpatient referral tasks.

Need for assessing LLMs in dynamic, interactive healthcare scenarios.

Proposing a framework to evaluate LLMs in Intelligent Outpatient Referral systems.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes comprehensive LLM evaluation framework.

Includes static and dynamic evaluation tasks.

Highlights LLMs' potential in interactive dialogues.

🔎 Similar Papers

No similar papers found.