Bridging the Long-Tail Gap: Robust Retrieval-Augmented Relation Completion via Multi-Stage Paraphrase Infusion

📅 2026-04-24

📈 Citations: 0

✨ Influential: 0

career value

134K/year

🤖 AI Summary

This work addresses the poor performance of large language models (LLMs) on long-tail and sparse relations in relation completion tasks, where existing retrieval-augmented generation (RAG) approaches offer limited gains. The authors propose RC-RAG, a novel framework that systematically integrates relation paraphrasing across all stages of the RAG pipeline—retrieval, summarization, and reasoning—to enhance semantic coverage of target relations without requiring model fine-tuning. By combining multi-stage paraphrase fusion, relation-aware summarization, and reasoning guidance, RC-RAG achieves substantial improvements on two benchmark datasets: under the best LLM configuration, it boosts Exact Match scores by 40.6 points, outperforming strong RAG baselines by 16.0 and 13.8 points respectively, while maintaining low computational overhead.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) struggle with relation completion (RC), both with and without retrieval-augmented generation (RAG), particularly when the required information is rare or sparsely represented. To address this, we propose a novel multi-stage paraphrase-guided relation-completion framework, RC-RAG, that systematically incorporates relation paraphrases across multiple stages. In particular, RC-RAG: (a) integrates paraphrases into retrieval to expand lexical coverage of the relation, (b) uses paraphrases to generate relation-aware summaries, and (c) leverages paraphrases during generation to guide reasoning for relation completion. Importantly, our method does not require any model fine-tuning. Experiments with five LLMs on two benchmark datasets show that RC-RAG consistently outperforms several RAG baselines. In long-tail settings, the best-performing LLM augmented with RC-RAG improves by 40.6 Exact Match (EM) points over its standalone performance and surpasses two strong RAG baselines by 16.0 and 13.8 EM points, respectively, while maintaining low computational overhead.

Problem

Research questions and friction points this paper is trying to address.

relation completion

long-tail

retrieval-augmented generation

large language models

lexical sparsity

Innovation

Methods, ideas, or system contributions that make the work stand out.

relation completion

retrieval-augmented generation

paraphrase infusion