UiS-IAI@LiveRAG: Retrieval-Augmented Information Nugget-Based Generation of Responses

📅 2025-06-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
RAG systems suffer from low factual accuracy, poor source traceability, and incomplete responses. To address these challenges, we propose a modular generative framework grounded in *information nuggets*—atomic semantic units enabling fine-grained information extraction, clustering, and provenance tracking. Our method comprises five components: query rewriting and subquery expansion; paragraph retrieval and re-ranking; nugget detection and clustering; cluster ranking and summarization; and response fluency optimization, augmented by a context filtering mechanism. Experiments show that even minimal subquery rewriting substantially improves recall, while moderate increases in retrieved documents enhance performance—though excessive retrieval degrades it. The framework jointly ensures factual grounding, source attributability, comprehensive information coverage, and response coherence. By treating nuggets as first-class semantic entities, it establishes an interpretable and verifiable generation paradigm for RAG systems.

Technology Category

Application Category

📝 Abstract
Retrieval-augmented generation (RAG) faces challenges related to factual correctness, source attribution, and response completeness. The LiveRAG Challenge hosted at SIGIR'25 aims to advance RAG research using a fixed corpus and a shared, open-source LLM. We propose a modular pipeline that operates on information nuggets-minimal, atomic units of relevant information extracted from retrieved documents. This multistage pipeline encompasses query rewriting, passage retrieval and reranking, nugget detection and clustering, cluster ranking and summarization, and response fluency enhancement. This design inherently promotes grounding in specific facts, facilitates source attribution, and ensures maximum information inclusion within length constraints. In this challenge, we extend our focus to also address the retrieval component of RAG, building upon our prior work on multi-faceted query rewriting. Furthermore, for augmented generation, we concentrate on improving context curation capabilities, maximizing the breadth of information covered in the response while ensuring pipeline efficiency. Our results show that combining original queries with a few sub-query rewrites boosts recall, while increasing the number of documents used for reranking and generation beyond a certain point reduces effectiveness, without improving response quality.
Problem

Research questions and friction points this paper is trying to address.

Enhancing factual correctness in retrieval-augmented generation systems
Improving source attribution and response completeness in RAG
Optimizing query rewriting and context curation for better information coverage
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular pipeline with information nuggets
Multi-faceted query rewriting for retrieval
Enhanced context curation for generation
🔎 Similar Papers
No similar papers found.