🤖 AI Summary
This work addresses the zero-shot causal structure inference problem among biological entities (e.g., genes, proteins), where no prior domain-specific training data is available. We propose the first evaluation paradigm that directly compares large language model (LLM) outputs against ground-truth interventional experimental data. Our method integrates customized multi-strategy prompting, retrieval-augmented generation (RAG), and lightweight LLM fine-tuning, and is systematically validated across >100 biological variables and >1,000 causal hypotheses. Results demonstrate that the model significantly outperforms random baselines, accurately capturing cell-context-specific and mechanistically complex causal relationships. Notably, it achieves effective knowledge distillation and causal hypothesis orchestration without domain-specific fine-tuning. This study provides the first empirical evidence that LLMs can reconstruct biological causal networks in a fully unsupervised setting, establishing a novel paradigm for AI-driven mechanistic discovery.
📝 Abstract
Genes, proteins and other biological entities influence one another via causal molecular networks. Causal relationships in such networks are mediated by complex and diverse mechanisms, through latent variables, and are often specific to cellular context. It remains challenging to characterise such networks in practice. Here, we present a novel framework to evaluate large language models (LLMs) for zero-shot inference of causal relationships in biology. In particular, we systematically evaluate causal claims obtained from an LLM using real-world interventional data. This is done over one hundred variables and thousands of causal hypotheses. Furthermore, we consider several prompting and retrieval-augmentation strategies, including large, and potentially conflicting, collections of scientific articles. Our results show that with tailored augmentation and prompting, even relatively small LLMs can capture meaningful aspects of causal structure in biological systems. This supports the notion that LLMs could act as orchestration tools in biological discovery, by helping to distil current knowledge in ways amenable to downstream analysis. Our approach to assessing LLMs with respect to experimental data is relevant for a broad range of problems at the intersection of causal learning, LLMs and scientific discovery.