🤖 AI Summary
This work addresses limitations in existing retrieval-augmented generation (RAG) methods for clinical diagnosis, which rely on binary exact-match rewards that fail to capture semantically relevant reasoning steps and provide insufficient supervision for diverse reasoning capabilities. To overcome these challenges, the authors propose the C-MIG framework, which leverages a frozen reference model to estimate fine-grained information gain from two complementary perspectives—retrieved documents and document refinement—thereby constructing multi-view rewards that mitigate learning signal loss and credit assignment issues. Additionally, C-MIG incorporates a multi-subquery retrieval mechanism to enhance recall coverage of clinical knowledge. Experimental results demonstrate that C-MIG significantly outperforms current RAG-based reinforcement learning approaches and the strongest general-purpose large language models across four medical benchmarks, exhibiting robust performance both within and across clinical domains.
📝 Abstract
Retrieval-augmented generation combined with reinforcement learning has shown promise for grounding large language models in trustworthy medical evidence. However, existing methods rely on exact-match binary rewards, which in clinical diagnosis cause two issues: (i) semantically relevant but non-verbatim steps receive zero signal, discarding valuable learning signals; and (ii) uni-dimensional rewards cannot effectively supervise heterogeneous reasoning capabilities. To address these issues, we propose C-MIG, a Multi-view Information Gain-based retrieval-augmented generation framework for Clinical diagnosis. C-MIG estimates information gain under a frozen reference model from two complementary views, retrieved-document and document-refinement, to jointly guide what to retrieve and how to refine, alleviating the issues of valuable reward signal loss and credit assignment. We further design a multi-subquery retrieval augmentation strategy that improves knowledge recall coverage in clinical diagnostic scenarios. Comprehensive experiments on four medical benchmarks demonstrate that C-MIG achieves the best performance among all RAG-RL methods on both in-domain and out-of-domain sets, and outperforms state-of-the-art general-purpose LLMs for clinical diagnosis.