UCAgents: Unidirectional Convergence for Visual Evidence Anchored Multi-Agent Medical Decision-Making

📅 2025-12-02

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Visual-language models (VLMs) suffer from “reasoning detachment” in medical diagnosis: generated textual explanations lack grounding in visual evidence, undermining clinical trustworthiness. While existing multi-agent frameworks simulate interdisciplinary discussion to mitigate bias, their open-ended interaction amplifies textual noise, incurs high computational overhead, and fails to enforce visual grounding. This paper proposes the first unidirectionally convergent multi-agent framework, which strictly constrains reasoning to medical images via a hierarchical collaborative architecture, a structured evidence-auditing mechanism, and a single-round interrogation protocol. We formally define the vision–text dual-noise bottleneck and optimize signal extraction using information-theoretic principles. Evaluated on four medical VQA benchmarks, our method achieves state-of-the-art performance: PathVQA accuracy improves to 71.3% (+6.0%), with an 87.7% reduction in token consumption—demonstrating both high accuracy and computational efficiency.

Technology Category

Application Category

📝 Abstract

Vision-Language Models (VLMs) show promise in medical diagnosis, yet suffer from reasoning detachment, where linguistically fluent explanations drift from verifiable image evidence, undermining clinical trust. Recent multi-agent frameworks simulate Multidisciplinary Team (MDT) debates to mitigate single-model bias, but open-ended discussions amplify textual noise and computational cost while failing to anchor reasoning to visual evidence, the cornerstone of medical decision-making. We propose UCAgents, a hierarchical multi-agent framework enforcing unidirectional convergence through structured evidence auditing. Inspired by clinical workflows, UCAgents forbids position changes and limits agent interactions to targeted evidence verification, suppressing rhetorical drift while amplifying visual signal extraction. In UCAgents, a one-round inquiry discussion is introduced to uncover potential risks of visual-textual misalignment. This design jointly constrains visual ambiguity and textual noise, a dual-noise bottleneck that we formalize via information theory. Extensive experiments on four medical VQA benchmarks show UCAgents achieves superior accuracy (71.3% on PathVQA, +6.0% over state-of-the-art) with 87.7% lower token cost, the evaluation results further confirm that UCAgents strikes a balance between uncovering more visual evidence and avoiding confusing textual interference. These results demonstrate that UCAgents exhibits both diagnostic reliability and computational efficiency critical for real-world clinical deployment. Code is available at https://github.com/fqhank/UCAgents.

Problem

Research questions and friction points this paper is trying to address.

Addresses reasoning detachment in medical VLMs where explanations drift from visual evidence

Mitigates textual noise and high computational costs in multi-agent medical diagnosis frameworks

Resolves the dual-noise bottleneck of visual ambiguity and textual interference in medical VQA

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical multi-agent framework enforces unidirectional convergence through structured evidence auditing.

Introduces one-round inquiry discussion to uncover visual-textual misalignment risks.

Jointly constrains visual ambiguity and textual noise via a dual-noise bottleneck.

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Natural Language Processing Researcher

Kitware

Arlington, Virginia

Natural Language Processing Researcher

Kitware

Clifton Park, New York / Carrboro, North Carolina / Minneapolis, MN

Natural Language Processing Researcher

Kitware

Remote, USA: AL, AZ, CO, DC, FL, GA, IL, IN, MA, MD, ME, MN, NC, NM, NY, OH, OR, PA, TN, TX, UT, VA, WI

AI Research Scientist, VLM (vision language models)