π€ AI Summary
This work addresses the challenges of hallucination and insufficient fine-grained clinical context in large language models (LLMs) for medical prediction, as well as the high latency of conventional retrieval-augmented approaches that rely on external knowledge bases, which impedes clinical timeliness. To overcome these limitations, the authors propose the Keys to Knowledge (K2K) framework, which encodes critical clinical information directly into the modelβs parameter space, replacing external retrieval with an in-memory key-value memory mechanism that enables zero-overhead knowledge access during inference. K2K further enhances retrieval quality and predictive reliability through activation-guided probing and a cross-attention reranking mechanism. Evaluated on four medical prediction benchmarks, K2K achieves state-of-the-art performance, effectively balancing low latency with high accuracy.
π Abstract
Large language models (LLMs) hold significant promise for healthcare, yet their reliability in high-stakes clinical settings is often compromised by hallucinations and a lack of granular medical context. While Retrieval Augmented Generation (RAG) can mitigate these issues, standard supervised pipelines require computationally intensive searches over massive external knowledge bases, leading to high latency that is impractical for time-sensitive care. To address this, we introduce Keys to Knowledge (K2K), a novel framework that replaces external retrieval with internal, key-based knowledge access. By encoding essential clinical information directly into the model's parameter space, K2K enables rapid retrieval from internal key-value memory without inference-time overhead. We further enhance retrieval quality through activation-guided probe construction and cross-attention reranking. Experimental results demonstrate that K2K achieves state-of-the-art performance across four benchmark healthcare outcome prediction datasets.