Rule-Based Explanations for Retrieval-Augmented LLM Systems

๐Ÿ“… 2025-10-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limited interpretability of retrieval-augmented large language models (RAG). We propose the first rule-based explanation framework for RAG, generating human-readable *if-then* rules that capture statistical associations between the presence or absence of retrieved evidence and model outputs. Methodologically, inspired by frequent itemset mining, we design an Apriori-style pruning strategy that integrates exhaustive pattern enumeration with efficient constraint-driven pruning, ensuring semantic plausibility while substantially improving computational efficiency. Experiments across multiple RAG benchmark tasks demonstrate that our approach rapidly produces high-fidelity, interpretable rulesโ€”achieving superior explanation quality and up to 60% reduction in rule discovery overhead compared to baselines. The framework establishes a lightweight, transparent, and verifiable attribution paradigm for RAG systems.

Technology Category

Application Category

๐Ÿ“ Abstract
If-then rules are widely used to explain machine learning models; e.g., "if employed = no, then loan application = rejected." We present the first proposal to apply rules to explain the emerging class of large language models (LLMs) with retrieval-augmented generation (RAG). Since RAG enables LLM systems to incorporate retrieved information sources at inference time, rules linking the presence or absence of sources can explain output provenance; e.g., "if a Times Higher Education ranking article is retrieved, then the LLM ranks Oxford first." To generate such rules, a brute force approach would probe the LLM with all source combinations and check if the presence or absence of any sources leads to the same output. We propose optimizations to speed up rule generation, inspired by Apriori-like pruning from frequent itemset mining but redefined within the scope of our novel problem. We conclude with qualitative and quantitative experiments demonstrating our solutions' value and efficiency.
Problem

Research questions and friction points this paper is trying to address.

Explain retrieval-augmented LLM systems using if-then rules
Generate rules linking retrieved sources to LLM outputs
Optimize rule generation with Apriori-inspired pruning techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Rule-based explanations for retrieval-augmented LLM systems
Optimized rule generation using Apriori-inspired pruning
Linking source presence/absence to explain output provenance
๐Ÿ”Ž Similar Papers
No similar papers found.
J
Joel Rorseth
University of Waterloo, Waterloo, Ontario, Canada
P
Parke Godfrey
York University, Toronto, Ontario, Canada
L
Lukasz Golab
University of Waterloo, Waterloo, Ontario, Canada
Divesh Srivastava
Divesh Srivastava
AT&T Labs-Research
Data management
Jarek Szlichta
Jarek Szlichta
York University, Toronto, Ontario, Canada