🤖 AI Summary
Existing retrieval models rely primarily on surface-level semantic matching, limiting their ability to capture deep relational structures—particularly causal relationships—thereby undermining accuracy in knowledge-intensive domains such as biomedicine and law. To address this, we propose CAWAI, the first dense retrieval framework that jointly optimizes semantic similarity and causal relevance through a dual-objective paradigm. CAWAI enhances semantic representations via contrastive learning and explicitly encodes inter-variable causal dependencies using a structured causal graph. Crucially, it achieves strong zero-shot cross-domain generalization without domain-specific annotations. Empirical evaluation on a large-scale causal retrieval benchmark demonstrates significant improvements over state-of-the-art models. Moreover, on multi-domain scientific question answering tasks, CAWAI achieves a 12.6% absolute gain in zero-shot accuracy, markedly improving the factual reliability of large language models (LLMs) in high-precision applications.
📝 Abstract
Recent advancements in large language models (LLMs) have significantly enhanced the performance of conversational AI systems. To extend their capabilities to knowledge-intensive domains such as biomedical and legal fields, where the accuracy is critical, LLMs are often combined with information retrieval (IR) systems to generate responses based on retrieved documents. However, for IR systems to effectively support such applications, they must go beyond simple semantic matching and accurately capture diverse query intents, including causal relationships. Existing IR models primarily focus on retrieving documents based on surface-level semantic similarity, overlooking deeper relational structures such as causality. To address this, we propose CAWAI, a retrieval model that is trained with dual objectives: semantic and causal relations. Our extensive experiments demonstrate that CAWAI outperforms various models on diverse causal retrieval tasks especially under large-scale retrieval settings. We also show that CAWAI exhibits strong zero-shot generalization across scientific domain QA tasks.