🤖 AI Summary
Current in-context learning approaches on electronic health records (EHRs) are hindered by narrow perspectives, lack of cohort awareness, and inefficient information aggregation, limiting their ability to support high-quality clinical reasoning. To address these challenges, this work proposes GraphWalker, a novel framework that jointly models patient clinical data and information gain estimated by large language models to enable dual-driven exemplar selection. GraphWalker incorporates a cohort discovery mechanism to capture population structure and introduces a lazy greedy search algorithm with frontier expansion to effectively mitigate information redundancy and diminishing marginal returns. Experimental results demonstrate that GraphWalker significantly outperforms state-of-the-art methods across multiple real-world EHR benchmarks, substantially enhancing clinical reasoning performance.
📝 Abstract
Clinical Reasoning on Electronic Health Records (EHRs) is a fundamental yet challenging task in modern healthcare. While in-context learning (ICL) offers a promising inference-time adaptation paradigm for large language models (LLMs) in EHR reasoning, existing methods face three fundamental challenges: (1) Perspective Limitation, where data-driven similarity fails to align with LLM reasoning needs and model-driven signals are constrained by limited clinical competence; (2) Cohort Awareness, as demonstrations are selected independently without modeling population-level structure; and (3) Information Aggregation, where redundancy and interaction effects among demonstrations are ignored, leading to diminishing marginal gains. To address these challenges, we propose GraphWalker, a principled demonstration selection framework for EHR-oriented ICL. GraphWalker (i) jointly models patient clinical information and LLM-estimated information gain by integrating data-driven and model-driven perspectives, (ii) incorporates Cohort Discovery to avoid noisy local optima, and (iii) employs a Lazy Greedy Search with Frontier Expansion algorithm to mitigate diminishing marginal returns in information aggregation. Extensive experiments on multiple real-world EHR benchmarks demonstrate that GraphWalker consistently outperforms state-of-the-art ICL baselines, yielding substantial improvements in clinical reasoning performance. Our code is open-sourced at https://github.com/PuppyKnightUniversity/GraphWalker