🤖 AI Summary
Large language models (LLMs) suffer from factual hallucination and weak adherence to domain-specific clinical rules in medical report structuring. To address these limitations, we propose MedPAO—a clinical-protocol-guided agent framework built upon a Plan-Act-Observe reasoning loop. MedPAO explicitly encodes standardized clinical protocols (e.g., ABCDEF for CXR interpretation) as hard constraints within the LLM’s inference process and integrates specialized tools to ensure interpretable, traceable, and protocol-compliant structured generation. The framework jointly supports concept classification and structured output generation via task decomposition, protocol-driven decision-making, and tool-augmented execution. Experiments demonstrate state-of-the-art performance: 0.96 F1-score on concept classification and an average clinical expert rating of 4.52/5 on structured outputs—significantly outperforming LLM-only baselines. MedPAO is the first approach to deeply embed clinical protocols into the LLM’s closed-loop reasoning, thereby enhancing accuracy, trustworthiness, and professional compliance in medical report structuring.
📝 Abstract
The deployment of Large Language Models (LLMs) for structuring clinical data is critically hindered by their tendency to hallucinate facts and their inability to follow domain-specific rules. To address this, we introduce MedPAO, a novel agentic framework that ensures accuracy and verifiable reasoning by grounding its operation in established clinical protocols such as the ABCDEF protocol for CXR analysis. MedPAO decomposes the report structuring task into a transparent process managed by a Plan-Act-Observe (PAO) loop and specialized tools. This protocol-driven method provides a verifiable alternative to opaque, monolithic models. The efficacy of our approach is demonstrated through rigorous evaluation: MedPAO achieves an F1-score of 0.96 on the critical sub-task of concept categorization. Notably, expert radiologists and clinicians rated the final structured outputs with an average score of 4.52 out of 5, indicating a level of reliability that surpasses baseline approaches relying solely on LLM-based foundation models. The code is available at: https://github.com/MiRL-IITM/medpao-agent