🤖 AI Summary
Existing research is often confined to physician–patient interactions or isolated administrative subtasks, failing to replicate the complex administrative workflows of real-world hospitals. This work proposes the first end-to-end multi-agent simulation framework that integrates realistic workflow modeling with the FHIR healthcare data standard, enabling a high-fidelity, scalable, and interoperable environment for simulating hospital administrative automation across institutions. The framework supports fine-grained, systematic evaluation of large language models (LLMs) under realistic loads—up to tens of thousands of daily requests—and establishes the first standardized benchmark for assessing LLM performance in administrative automation tasks within healthcare settings.
📝 Abstract
Hospital administration departments handle a wide range of operational tasks and, in large hospitals, process over 10,000 requests per day, driving growing interest in LLM-based automation. However, prior work has focused primarily on patient--physician interactions or isolated administrative subtasks, failing to capture the complexity of real administrative workflows. To address this gap, we propose H-AdminSim, a comprehensive end-to-end simulation framework that combines realistic data generation with multi-agent-based simulation of hospital administrative workflows. These tasks are quantitatively evaluated using detailed rubrics, enabling systematic comparison of LLMs. Through FHIR integration, H-AdminSim provides a unified and interoperable environment for testing administrative workflows across heterogeneous hospital settings, serving as a standardized testbed for assessing the feasibility and performance of LLM-driven administrative automation.