Medical Interpretability and Knowledge Maps of Large Language Models

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The neural mechanisms underlying medical knowledge representation in large language models (LLMs) remain poorly understood, particularly regarding how key clinical concepts—such as age, symptoms, diseases, and drugs—are encoded and hierarchically distributed. Method: We conduct a cross-model interpretability analysis of five mainstream LLMs using four complementary techniques: UMAP visualization, gradient-based saliency mapping, layer-wise ablation, and activation patching. Contribution/Results: We identify consistent patterns: (1) medical knowledge concentrates predominantly in the first half of transformer layers; (2) age encoding exhibits nonlinear geometry; (3) disease representations form circular manifold structures; (4) drug embeddings cluster by clinical specialty rather than pharmacological mechanism; and (5) certain models show mid-layer activation collapse followed by late-layer recovery. Based on these findings, we construct the first multi-model medical knowledge map, precisely localizing critical knowledge-storage layers. This map provides actionable intervention targets for medical domain fine-tuning, bias mitigation, and knowledge editing.

Technology Category

Application Category

📝 Abstract
We present a systematic study of medical-domain interpretability in Large Language Models (LLMs). We study how the LLMs both represent and process medical knowledge through four different interpretability techniques: (1) UMAP projections of intermediate activations, (2) gradient-based saliency with respect to the model weights, (3) layer lesioning/removal and (4) activation patching. We present knowledge maps of five LLMs which show, at a coarse-resolution, where knowledge about patient's ages, medical symptoms, diseases and drugs is stored in the models. In particular for Llama3.3-70B, we find that most medical knowledge is processed in the first half of the model's layers. In addition, we find several interesting phenomena: (i) age is often encoded in a non-linear and sometimes discontinuous manner at intermediate layers in the models, (ii) the disease progression representation is non-monotonic and circular at certain layers of the model, (iii) in Llama3.3-70B, drugs cluster better by medical specialty rather than mechanism of action, especially for Llama3.3-70B and (iv) Gemma3-27B and MedGemma-27B have activations that collapse at intermediate layers but recover by the final layers. These results can guide future research on fine-tuning, un-learning or de-biasing LLMs for medical tasks by suggesting at which layers in the model these techniques should be applied.
Problem

Research questions and friction points this paper is trying to address.

Analyzing medical knowledge representation in LLMs
Identifying storage locations of medical concepts in models
Guiding fine-tuning strategies through layer-specific interventions
Innovation

Methods, ideas, or system contributions that make the work stand out.

UMAP projections analyze intermediate layer activations
Gradient saliency maps reveal model weight importance
Layer lesioning identifies critical medical knowledge locations
🔎 Similar Papers
No similar papers found.