RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Current retrieval-augmented large language models (RAG-LLMs) lack interpretability mechanisms, making it difficult to trace their decision rationales and assess their safety. To address this limitation, this work proposes RUBEN—an interactive explanation tool that employs a novel rule-pruning algorithm to efficiently extract minimal covering rule sets that explain model outputs. This approach is the first to apply compact rule sets to the safety evaluation of RAG-LLMs and the detection of adversarial prompt injections. Experimental results demonstrate that RUBEN not only generates concise, highly interpretable rules that effectively reveal the underlying logic of model behavior but also exhibits significant practical utility in validating the robustness of safety training and identifying adversarial attacks.

📝 Abstract

This paper demonstrates RUBEN, an interactive tool for discovering minimal rules to explain the outputs of retrieval-augmented large language models (LLMs) in data-driven applications. We leverage novel pruning strategies to efficiently identify a minimal set of rules that subsume all others. We further demonstrate novel applications of these rules for LLM safety, specifically to test the resiliency of safety training and effectiveness of adversarial prompt injections.

Problem

Research questions and friction points this paper is trying to address.

Rule-Based Explanations

Retrieval-Augmented LLMs

LLM Safety

Adversarial Prompt Injections

Explainability

Innovation

Methods, ideas, or system contributions that make the work stand out.

rule-based explanation

retrieval-augmented LLM

pruning strategy