🤖 AI Summary
To address the lack of autonomous decision-making capability in medical imaging triage, this paper proposes an uncertainty-aware intelligent agent for autonomous chest X-ray triage. The method integrates confidence estimation, out-of-distribution detection, and selective prediction within a dual-path decision framework—combining rule-based reasoning with large language model (LLM)-enhanced interpretation—to dynamically perform automated diagnosis, urgent escalation alerts, or human referral. Leveraging vision-language models, it enables zero-shot inference while ensuring low latency and enhanced decision reliability. Evaluated on the NIH ChestX-ray14 dataset, our approach significantly outperforms existing zero-shot and supervised baselines: it reduces the Area Under the Risk-Coverage curve (AURC) by 23.6% and decreases error rate in high-coverage regions by 18.4%, all while meeting clinical real-time response requirements. To the best of our knowledge, this is the first autonomous imaging triage agent demonstrated to be both deployable and trustworthy under realistic clinical constraints.
📝 Abstract
Agentic AI is advancing rapidly, yet truly autonomous medical-imaging triage, where a system decides when to stop, escalate, or defer under real constraints, remains relatively underexplored. To address this gap, we introduce AT-CXR, an uncertainty-aware agent for chest X-rays. The system estimates per-case confidence and distributional fit, then follows a stepwise policy to issue an automated decision or abstain with a suggested label for human intervention. We evaluate two router designs that share the same inputs and actions: a deterministic rule-based router and an LLM-decided router. Across five-fold evaluation on a balanced subset of NIH ChestX-ray14 dataset, both variants outperform strong zero-shot vision-language models and state-of-the-art supervised classifiers, achieving higher full-coverage accuracy and superior selective-prediction performance, evidenced by a lower area under the risk-coverage curve (AURC) and a lower error rate at high coverage, while operating with lower latency that meets practical clinical constraints. The two routers provide complementary operating points, enabling deployments to prioritize maximal throughput or maximal accuracy. Our code is available at https://github.com/XLIAaron/uncertainty-aware-cxr-agent.