🤖 AI Summary
In causal discovery, expert knowledge—such as outputs from large language models—exhibits high heterogeneity in reliability and often conflicts with observational data. To address this, we propose L2D-CD, the first framework to integrate learning-to-defer into causal discovery, enabling adaptive, pair-wise decisions on whether to rely on data-driven inference or expert advice for each variable pair. L2D-CD unifies pairwise causal discovery, numerical data modeling, and textual metadata encoding to construct a dual-path deferral function, supporting fine-grained confidence estimation and precise identification of expert strengths and weaknesses across domains. Evaluated on the Tübingen cause-effect pairs benchmark, L2D-CD significantly outperforms purely data-driven and purely expert-based baselines. Moreover, it quantitatively reveals domain-specific expert performance variations, offering interpretable insights into when and why expert guidance is trustworthy—or not—in different causal scenarios.
📝 Abstract
Integrating expert knowledge, e.g. from large language models, into causal discovery algorithms can be challenging when the knowledge is not guaranteed to be correct. Expert recommendations may contradict data-driven results, and their reliability can vary significantly depending on the domain or specific query. Existing methods based on soft constraints or inconsistencies in predicted causal relationships fail to account for these variations in expertise. To remedy this, we propose L2D-CD, a method for gauging the correctness of expert recommendations and optimally combining them with data-driven causal discovery results. By adapting learning-to-defer (L2D) algorithms for pairwise causal discovery (CD), we learn a deferral function that selects whether to rely on classical causal discovery methods using numerical data or expert recommendations based on textual meta-data. We evaluate L2D-CD on the canonical T""ubingen pairs dataset and demonstrate its superior performance compared to both the causal discovery method and the expert used in isolation. Moreover, our approach identifies domains where the expert's performance is strong or weak. Finally, we outline a strategy for generalizing this approach to causal discovery on graphs with more than two variables, paving the way for further research in this area.