🤖 AI Summary
To address context dilution and hallucination in multimodal clinical reasoning for gastrointestinal (GI) tumor multidisciplinary team (MDT) decision-making, this study proposes a hierarchical multi-agent framework that emulates human MDT collaboration for integrated analysis of endoscopic images, radiological data, and biochemical biomarkers. The architecture synergistically integrates multimodal large language models (MLLMs), domain-specialized agents with defined roles, and a layered reasoning pipeline—thereby enhancing cross-modal information fusion and interpretability of diagnostic inference. In clinical evaluation, the system achieved an expert rating of 4.60/5.00, significantly outperforming monolithic baseline models. It demonstrates substantial improvements in diagnostic accuracy, logical rigor, and decision traceability. This work establishes a scalable, verifiable AI-augmented decision-making paradigm for complex GI oncology applications.
📝 Abstract
Multimodal clinical reasoning in the field of gastrointestinal (GI) oncology necessitates the integrated interpretation of endoscopic imagery, radiological data, and biochemical markers. Despite the evident potential exhibited by Multimodal Large Language Models (MLLMs), they frequently encounter challenges such as context dilution and hallucination when confronted with intricate, heterogeneous medical histories. In order to address these limitations, a hierarchical Multi-Agent Framework is proposed, which emulates the collaborative workflow of a human Multidisciplinary Team (MDT). The system attained a composite expert evaluation score of 4.60/5.00, thereby demonstrating a substantial improvement over the monolithic baseline. It is noteworthy that the agent-based architecture yielded the most substantial enhancements in reasoning logic and medical accuracy. The findings indicate that mimetic, agent-based collaboration provides a scalable, interpretable, and clinically robust paradigm for automated decision support in oncology.