🤖 AI Summary
Hospital call centers generate vast volumes of employee messages containing implicit patient service issues, yet efficiently extracting these issues remains challenging. Traditional supervised learning approaches are hindered by reliance on large-scale labeled data and extensive hyperparameter tuning, while strict HIPAA compliance and data security constraints further limit data usability.
Method: We propose a multi-stage large language model (LLM) classification framework that jointly leverages reasoning-oriented (e.g., o3), general-purpose, and lightweight LLMs. It integrates zero-shot and few-shot inference, multi-class topic identification, and root-cause classification—eliminating the need for extensive annotation or fine-tuning—while ensuring HIPAA compliance and data privacy. The framework is embedded in a visual decision-support system to generate actionable insights.
Results: The optimal configuration (o3) achieves 78.4% weighted F1-score and 79.2% accuracy. It significantly improves call navigator training efficacy and patient experience, establishing a low-data-dependency, highly compliant, and interpretable paradigm for LLM deployment in healthcare operations analytics.
📝 Abstract
Hospital call centers serve as the primary contact point for patients within a hospital system. They also generate substantial volumes of staff messages as navigators process patient requests and communicate with the hospital offices following the established protocol restrictions and guidelines. This continuously accumulated large amount of text data can be mined and processed to retrieve insights; however, traditional supervised learning approaches require annotated data, extensive training, and model tuning. Large Language Models (LLMs) offer a paradigm shift toward more computationally efficient methodologies for healthcare analytics. This paper presents a multi-stage LLM-based framework that identifies staff message topics and classifies messages by their reasons in a multi-class fashion. In the process, multiple LLM types, including reasoning, general-purpose, and lightweight models, were evaluated. The best-performing model was o3, achieving 78.4% weighted F1-score and 79.2% accuracy, followed closely by gpt-5 (75.3% Weighted F1-score and 76.2% accuracy). The proposed methodology incorporates data security measures and HIPAA compliance requirements essential for healthcare environments. The processed LLM outputs are integrated into a visualization decision support tool that transforms the staff messages into actionable insights accessible to healthcare professionals. This approach enables more efficient utilization of the collected staff messaging data, identifies navigator training opportunities, and supports improved patient experience and care quality.