Retrieval Augmented Generation based Large Language Models for Causality Mining

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing causal detection methods face a dual bottleneck: unsupervised approaches exhibit poor cross-domain generalization, while supervised methods are constrained by scarce annotated data. To address this, we propose the first retrieval-augmented generation (RAG)-based dynamic prompting framework tailored for causal mining. Our method jointly integrates large language models (LLMs)—including LLaMA-3, Qwen, and GLM—with causal pattern matching and semantic re-ranking modules, leveraging context-aware causal rule retrieval and adaptive prompt construction. Crucially, it operates without requiring large-scale labeled data, thereby significantly enhancing few-shot robustness and cross-domain generalizability. Evaluated on three standard causal detection benchmarks, our approach achieves an average 12.7% F1-score improvement over static prompting baselines. The gains hold consistently across five mainstream LLMs, demonstrating both the universality and effectiveness of the proposed framework.

Technology Category

Application Category

📝 Abstract
Causality detection and mining are important tasks in information retrieval due to their enormous use in information extraction, and knowledge graph construction. To solve these tasks, in existing literature there exist several solutions -- both unsupervised and supervised. However, the unsupervised methods suffer from poor performance and they often require significant human intervention for causal rule selection, leading to poor generalization across different domains. On the other hand, supervised methods suffer from the lack of large training datasets. Recently, large language models (LLMs) with effective prompt engineering are found to be effective to overcome the issue of unavailability of large training dataset. Yet, in existing literature, there does not exist comprehensive works on causality detection and mining using LLM prompting. In this paper, we present several retrieval-augmented generation (RAG) based dynamic prompting schemes to enhance LLM performance in causality detection and extraction tasks. Extensive experiments over three datasets and five LLMs validate the superiority of our proposed RAG-based dynamic prompting over other static prompting schemes.
Problem

Research questions and friction points this paper is trying to address.

Improving causality detection accuracy in information retrieval
Reducing human intervention in causal rule selection
Enhancing LLM performance with dynamic prompting schemes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval Augmented Generation enhances causality mining
Dynamic prompting schemes improve LLM performance
RAG-based approach outperforms static prompting methods
T
Thushara Manjari Naduvilakandy
Department of Computer Science, Indiana University Indianapolis
H
Hyeju Jang
Department of Computer Science, Indiana University Indianapolis
Mohammad Al Hasan
Mohammad Al Hasan
Professor of Computer Science, Indiana University Indianapolis, USA
Data MiningNetwork AnalysisGraph Machine Learning