🤖 AI Summary
This work addresses the challenge of unreliable experience reuse in clinical decision-making agents, where conventional memory mechanisms introduce noise due to the absence of explicit relational structures, often degrading performance. To overcome this limitation, we propose the first graph-structured, self-evolving memory mechanism tailored for clinical reasoning. Our approach constructs a two-layer memory graph that captures both intra-decision structural relationships and inter-experience associations. It further integrates relevance-aware retrieval with online feedback-driven calibration of node quality and edge weights. This design significantly enhances the accuracy and robustness of memory retrieval and reuse. Evaluated on MedR-Bench and MedAgentsBench using DeepSeek-V3.2 and Qwen3.5-35B respectively, our method achieves average accuracies of 70.90% and 69.24%, consistently outperforming all existing baselines.
📝 Abstract
Clinical decision-making agents can benefit from reusing prior decision experience. However, many memory-augmented methods store experiences as independent records without explicit relational structure, which may introduce noisy retrieval, unreliable reuse, and in some cases even hurt performance compared to direct LLM inference. We propose GSEM (Graph-based Self-Evolving Memory), a clinical memory framework that organizes clinical experiences into a dual-layer memory graph, capturing both the decision structure within each experience and the relational dependencies across experiences, and supporting applicability-aware retrieval and online feedback-driven calibration of node quality and edge weights. Across MedR-Bench and MedAgentsBench with two LLM backbones, GSEM achieves the highest average accuracy among all baselines, reaching 70.90\% and 69.24\% with DeepSeek-V3.2 and Qwen3.5-35B, respectively. Code is available at https://github.com/xhan1022/gsem.