Eliciting associations between clinical variables from LLMs via comparison questions across populations

πŸ“… 2026-05-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

190K/year
πŸ€– AI Summary
This work proposes a novel approach to reliably extract associations and causal relationships among clinical variables from large language models (LLMs) to support medical decision-making. To circumvent biases inherent in direct questioning, the method employs an indirect extraction strategy based on structured triplet-comparison prompting. By integrating statistical correlation modeling with Invariant Causal Prediction (ICP), it enables cross-population causal structure inference without requiring access to the model’s internal parameters. This framework uniquely facilitates robust causal discovery under prompt-level environmental shifts. Evaluated on clinical data from chronic obstructive pulmonary disease (COPD) and multiple sclerosis (MS) cohorts, the approach successfully identifies smooth, stable, and clinically interpretable variable associations and uncovers several statistically significant candidate causal links.
πŸ“ Abstract
The training data of large language models (LLMs) comprises a wide range of biomedical literature, reflecting data from many different patient populations. We investigate how it might be possible to recover information on correlation and causal links between patient characteristics, as a key building block for medical decision making. To avoid the pitfalls of direct elicitation, we propose an approach based on structured comparison questions, specifically patient comparison triplet questions. This is combined with a statistical model for the LLM representation that provides estimates of correlations without access to activations or model internals. Intuitively, we consider how similarity decisions of LLMs based on a first variable are affected by providing information on a second variable for one of the patients being assessed. We then induce prompt-level environment shifts to obtain correlation estimates for different subpopulations, which enables an invariant causal prediction (ICP) approach to obtain conservative candidate parent links. We demonstrate the method in two clinical domains, chronic obstructive pulmonary disease (COPD) and multiple sclerosis (MS). Across prompted environments, the elicited correlations are smooth, stable, and clinically interpretable, yet vary in a statistically significant way that supports downstream invariance testing, such that ICP provides a small set of candidate invariant parent links. These results show that indirect elicitation via triplet comparisons can recover meaningful association structure from LLMs and offer a cautious route from implicit correlations to causal statements that are congruent with LLM answering patterns.
Problem

Research questions and friction points this paper is trying to address.

large language models
clinical variables
causal inference
correlation elicitation
medical decision making
Innovation

Methods, ideas, or system contributions that make the work stand out.

triplet comparison
invariant causal prediction
LLM elicitation
clinical correlation
prompt-level environment shift
πŸ”Ž Similar Papers
No similar papers found.