BRIGHT+: Upgrading the BRIGHT Benchmark with MARCUS, a Multi-Agent RAG Clean-Up Suite

📅 2025-06-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The BRIGHT benchmark suffers from data redundancy and semantic fragmentation due to web crawling, severely impairing multi-hop retrieval accuracy and reasoning coherence—especially across seven StackExchange domains. To address this, we propose MARCUS, a RAG-optimized multi-agent data cleaning framework that introduces a novel synergistic paradigm of structural denoising and semantic re-chunking. MARCUS employs a pipeline of specialized LLM agents—performing structural cleaning, semantic segmentation, context alignment, and answer-span preservation—to construct BRIGHT-Plus, a high-quality subset. BRIGHT-Plus retains critical answer spans while significantly improving contextual completeness and reasoning consistency. Empirical evaluation shows an average 12.7% gain in multi-hop retrieval accuracy across diverse retrievers, alongside enhanced reasoning performance. All code, tooling, and the BRIGHT-Plus dataset are fully open-sourced, establishing a new robust benchmark and methodological foundation for retrieval-augmented reasoning research.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) systems require corpora that are both structurally clean and semantically coherent. BRIGHT is a recent and influential benchmark designed to evaluate complex multi-hop retrieval across diverse, high-reasoning domains. However, its practical effectiveness is limited by common web-crawled artifacts - such as content redundancy and semantic discontinuity - that impair retrieval accuracy and downstream reasoning. Notably, we find that such issues are concentrated in seven StackExchange-derived subdomains, while other domains (e.g., Coding and Theorem-based content) remain relatively clean. In this study, we present MARCUS, a multi-agent pipeline that leverages large language models (LLMs) to systematically clean and re-chunk BRIGHT into a higher-quality corpus: BRIGHT-Plus. MARCUS applies dedicated agents for structural noise removal and semantic segmentation, preserving answer-bearing spans while improving contextual integrity. Experimental evaluations demonstrate that BRIGHT-Plus yields consistent and significant improvements in both retrieval accuracy and multi-hop reasoning across a diverse set of retrievers. We release both the BRIGHT-Plus corpus and the MARCUS pipeline to support future research on robust, reasoning-centric retrieval.
Problem

Research questions and friction points this paper is trying to address.

Improving RAG systems by cleaning web-crawled artifacts
Enhancing BRIGHT benchmark's retrieval accuracy and reasoning
Addressing content redundancy and semantic discontinuity issues
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent pipeline cleans web-crawled artifacts
LLMs remove noise and improve semantic coherence
Re-chunked corpus enhances retrieval and reasoning
🔎 Similar Papers
No similar papers found.