OpenHospital: A Thing-in-itself Arena for Evolving and Benchmarking LLM-based Collective Intelligence

📅 2026-03-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of effective platforms for evolving and evaluating large language model (LLM)-driven collective intelligence (CI). To this end, we propose OpenHospital—a multi-agent simulation arena where continuous interactions between physician and patient agents enable the evolution and assessment of CI. The framework innovatively adopts a “data resides within the agent” paradigm, facilitating rapid capability enhancement of individual agents, and introduces a comprehensive evaluation metric that integrates medical expertise with system efficiency. Experimental results demonstrate that OpenHospital effectively fosters and accurately quantifies the level of LLM-driven collective intelligence, offering a robust infrastructure for advancing research in collaborative AI systems within complex, domain-specific environments.

Technology Category

Application Category

📝 Abstract
Large Language Model (LLM)-based Collective Intelligence (CI) presents a promising approach to overcoming the data wall and continuously boosting the capabilities of LLM agents. However, there is currently no dedicated arena for evolving and benchmarking LLM-based CI. To address this gap, we introduce OpenHospital, an interactive arena where physician agents can evolve CI through interactions with patient agents. This arena employs a data-in-agent-self paradigm that rapidly enhances agent capabilities and provides robust evaluation metrics for benchmarking both medical proficiency and system efficiency. Experiments demonstrate the effectiveness of OpenHospital in both fostering and quantifying CI.
Problem

Research questions and friction points this paper is trying to address.

Collective Intelligence
Large Language Model
Benchmarking
Agent Interaction
Evaluation Arena
Innovation

Methods, ideas, or system contributions that make the work stand out.

Collective Intelligence
Large Language Model
data-in-agent-self
benchmarking
interactive arena
🔎 Similar Papers
No similar papers found.