๐ค AI Summary
Current medical vision-language models struggle to differentiate between visually similar or confusable diseases due to their inability to accumulate discriminative diagnostic experience from past failures. To address this limitation, this work proposes MedExpMem, a novel framework that, for the first time, integrates clinical learning mechanisms of physicians into AI systems. MedExpMem employs reflective re-diagnosis to construct structured experiential memory, encoding key discriminative features, decision rules, and error patterns in the form of pairwise differential notes. Coupled with retrieval-augmented reasoning, this approach enables dynamic, adaptive diagnosis. The framework adopts a two-stage memory construction pipeline and demonstrates significant performance gainsโup to a 7.0% accuracy improvement across multiple modelsโon a benchmark spanning 11 radiology sub-specialties, thereby validating its effectiveness and robustness.
๐ Abstract
Experienced physicians develop diagnostic expertise through clinical practice, acquiring not only disease knowledge but also the ability to differentiate confusable conditions. Current medical vision-language models (VLMs) lack this capability -- their parameters encode static knowledge that does not evolve across diagnostic encounters. We propose MedExpMem, an experience memory framework enabling VLM-based diagnostic agents to accumulate differential diagnosis expertise. Unlike retrieval-augmented generation, which retrieves encyclopedic disease descriptions, MedExpMem memorizes discriminative experience derived from the agent's own diagnostic failures and organizes them as pairwise differential notes encoding key discriminators, actionable decision rules and reasoning error patterns. The framework adopts a two-phase construction process mirroring physician learning: initial practice exposes knowledge gaps, and reflective re-diagnosis refines understanding. When encountering new cases, the agent retrieves experience memory to guide differential reasoning. We evaluate MedExpMem on a radiology benchmark spanning 11 subspecialties. Results demonstrate consistent accuracy improvements, maximum 7.0%, across diverse models and scales. Analytical experiments validate experience quality and robustness, demonstrating MedExpMem as a competitive method addresses medical adaptation needs beyond the reach of parameteric learning.