CXRAgent: Director-Orchestrated Multi-Stage Reasoning for Chest X-Ray Interpretation

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Current chest X-ray (CXR) interpretation models suffer from poor generalizability and weak reasoning capabilities, while large language model (LLM)-based agents lack mechanisms for evaluating tool reliability, undermining clinical trustworthiness. To address these limitations, we propose CXRAgent: a multi-stage, central-commander-driven intelligent agent framework for CXR interpretation. It integrates LLMs, domain-specific medical image analysis tools, an Evidence-Driven Verifier (EDV), contextual memory, and role-specialized expert agents for collaborative reasoning. Our key innovations include a dynamic commander mechanism that orchestrates adaptive expert team formation and an EDV module enabling multi-tool reliability assessment and visualization-supported, evidence-grounded diagnostic consensus. Experiments demonstrate that CXRAgent significantly improves diagnostic accuracy, interpretability, and cross-task generalization across diverse CXR benchmarks, generating traceable, multimodal (visual-textual) evidential outputs.

Technology Category

Application Category

📝 Abstract

Chest X-ray (CXR) plays a pivotal role in clinical diagnosis, and a variety of task-specific and foundation models have been developed for automatic CXR interpretation. However, these models often struggle to adapt to new diagnostic tasks and complex reasoning scenarios. Recently, LLM-based agent models have emerged as a promising paradigm for CXR analysis, enhancing model's capability through tool coordination, multi-step reasoning, and team collaboration, etc. However, existing agents often rely on a single diagnostic pipeline and lack mechanisms for assessing tools' reliability, limiting their adaptability and credibility. To this end, we propose CXRAgent, a director-orchestrated, multi-stage agent for CXR interpretation, where a central director coordinates the following stages: (1) Tool Invocation: The agent strategically orchestrates a set of CXR-analysis tools, with outputs normalized and verified by the Evidence-driven Validator (EDV), which grounds diagnostic outputs with visual evidence to support reliable downstream diagnosis; (2) Diagnostic Planning: Guided by task requirements and intermediate findings, the agent formulates a targeted diagnostic plan. It then assembles an expert team accordingly, defining member roles and coordinating their interactions to enable adaptive and collaborative reasoning; (3) Collaborative Decision-making: The agent integrates insights from the expert team with accumulated contextual memories, synthesizing them into an evidence-backed diagnostic conclusion. Experiments on various CXR interpretation tasks show that CXRAgent delivers strong performance, providing visual evidence and generalizes well to clinical tasks of different complexity. Code and data are valuable at this href{https://github.com/laojiahuo2003/CXRAgent/}{link}.

Problem

Research questions and friction points this paper is trying to address.

Addresses limitations of single-pipeline CXR agents lacking reliability assessment

Enhances diagnostic adaptability through multi-stage orchestration and tool validation

Improves complex reasoning via evidence-backed team collaboration and memory integration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Director-orchestrated multi-stage reasoning for CXR interpretation

Evidence-driven validator verifies tool outputs with visual evidence

Assembles expert teams for adaptive collaborative diagnostic planning

🔎 Similar Papers

No similar papers found.