MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning

📅 2025-10-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address three key challenges in clinical visual question answering (VQA)—hallucination generation, inefficient fixed-depth reasoning, and difficulty in multi-institutional collaboration—this paper proposes MedAlign. Methodologically, it innovatively integrates multimodal Direct Preference Optimization (mDPO) with a Retrieval-Aware Mixture-of-Experts (RA-MoE) architecture, augmented by a metacognitive uncertainty estimation mechanism under federated governance, enabling vision-evidence-driven dynamic expert selection and adaptive chain-of-thought reasoning. Its primary contributions are: (1) the first incorporation of metacognitive modeling into the federated learning paradigm to support cross-institutional collaboration; and (2) significant improvements in visual alignment fidelity and reasoning efficiency. Experiments demonstrate that MedAlign achieves state-of-the-art performance across three mainstream Med-VQA benchmarks, with up to 11.85% absolute F1-score gain and a 51.60% reduction in average reasoning steps.

Technology Category

Application Category

📝 Abstract
Recently, large models have shown significant potential for smart healthcare. However, the deployment of Large Vision-Language Models (LVLMs) for clinical services is currently hindered by three critical challenges: a tendency to hallucinate answers not grounded in visual evidence, the inefficiency of fixed-depth reasoning, and the difficulty of multi-institutional collaboration. To address these challenges, in this paper, we develop MedAlign, a novel framework to ensure visually accurate LVLM responses for Medical Visual Question Answering (Med-VQA). Specifically, we first propose a multimodal Direct Preference Optimization (mDPO) objective to explicitly align preference learning with visual context. We then design a Retrieval-Aware Mixture-of-Experts (RA-MoE) architecture that utilizes image and text similarity to route queries to a specialized and context-augmented LVLM (i.e., an expert), thereby mitigating hallucinations in LVLMs. To achieve adaptive reasoning and facilitate multi-institutional collaboration, we propose a federated governance mechanism, where the selected expert, fine-tuned on clinical datasets based on mDPO, locally performs iterative Chain-of-Thought (CoT) reasoning via the local meta-cognitive uncertainty estimator. Extensive experiments on three representative Med-VQA datasets demonstrate that MedAlign achieves state-of-the-art performance, outperforming strong retrieval-augmented baselines by up to $11.85%$ in F1-score, and simultaneously reducing the average reasoning length by $51.60%$ compared with fixed-depth CoT approaches.
Problem

Research questions and friction points this paper is trying to address.

Reduces hallucinations in medical visual question answering
Enables adaptive reasoning through federated meta-cognitive learning
Facilitates multi-institutional collaboration for clinical LVLM deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal preference optimization aligns learning with visual context
Retrieval-aware mixture-of-experts routes queries to specialized models
Federated meta-cognitive reasoning enables adaptive multi-institutional collaboration
🔎 Similar Papers
No similar papers found.
S
Siyong Chen
School of Automation, Guangdong University of Technology, Guangzhou, China
Jinbo Wen
Jinbo Wen
M.S. Student, Nanjing University of Aeronautics and Astronautics
GenAI+NetworkingContract TheoryMetaverseBlockchain
J
Jiawen Kang
School of Automation, Guangdong University of Technology, Guangzhou, China
T
Tenghui Huang
School of Automation, Guangdong University of Technology, Guangzhou, China
X
Xumin Huang
School of Automation, Guangdong University of Technology, Guangzhou, China
Y
Yuanjia Su
School of Automation, Guangdong University of Technology, Guangzhou, China
H
Hudan Pan
State Key Laboratory of Traditional Chinese Medicine Syndrome, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangdong Provincial Hospital of Chinese Medicine, Guangdong Provincial Academy of Chinese Medical Sciences, Guangzhou, China, and Chinese Medicine Guangdong Laboratory, Zhuhai, China
Zishao Zhong
Zishao Zhong
Associate Chief Physician, Guangdong Provincial Hospital of Chinese Medicine
GastroenterologyTCM
D
Dusit Niyato
College of Computing and Data Science, Nanyang Technological University, Singapore
Shengli Xie
Shengli Xie
South China University of Technology; Guangdong University of Technology
blind source separation
Dong In Kim
Dong In Kim
Sungkyunkwan University (SKKU)
Wireless CommunicationsInternet of ThingsWireless Power TransferConnected Intelligence