C-MIG: Multi-view Information Gain-based Retrieval-Augmented Generation for Clinical Diagnosis Reasoning

📅 2026-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses limitations in existing retrieval-augmented generation (RAG) methods for clinical diagnosis, which rely on binary exact-match rewards that fail to capture semantically relevant reasoning steps and provide insufficient supervision for diverse reasoning capabilities. To overcome these challenges, the authors propose the C-MIG framework, which leverages a frozen reference model to estimate fine-grained information gain from two complementary perspectives—retrieved documents and document refinement—thereby constructing multi-view rewards that mitigate learning signal loss and credit assignment issues. Additionally, C-MIG incorporates a multi-subquery retrieval mechanism to enhance recall coverage of clinical knowledge. Experimental results demonstrate that C-MIG significantly outperforms current RAG-based reinforcement learning approaches and the strongest general-purpose large language models across four medical benchmarks, exhibiting robust performance both within and across clinical domains.
📝 Abstract
Retrieval-augmented generation combined with reinforcement learning has shown promise for grounding large language models in trustworthy medical evidence. However, existing methods rely on exact-match binary rewards, which in clinical diagnosis cause two issues: (i) semantically relevant but non-verbatim steps receive zero signal, discarding valuable learning signals; and (ii) uni-dimensional rewards cannot effectively supervise heterogeneous reasoning capabilities. To address these issues, we propose C-MIG, a Multi-view Information Gain-based retrieval-augmented generation framework for Clinical diagnosis. C-MIG estimates information gain under a frozen reference model from two complementary views, retrieved-document and document-refinement, to jointly guide what to retrieve and how to refine, alleviating the issues of valuable reward signal loss and credit assignment. We further design a multi-subquery retrieval augmentation strategy that improves knowledge recall coverage in clinical diagnostic scenarios. Comprehensive experiments on four medical benchmarks demonstrate that C-MIG achieves the best performance among all RAG-RL methods on both in-domain and out-of-domain sets, and outperforms state-of-the-art general-purpose LLMs for clinical diagnosis.
Problem

Research questions and friction points this paper is trying to address.

retrieval-augmented generation
clinical diagnosis reasoning
reinforcement learning
reward signal
information gain
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation
Information Gain
Multi-view Reward
Clinical Diagnosis Reasoning
Reinforcement Learning
🔎 Similar Papers