FineFilter: A Fine-grained Noise Filtering Mechanism for Retrieval-Augmented Large Language Models

📅 2025-02-17
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
In retrieval-augmented generation (RAG), retrieved documents often contain noise that obscures critical answer clues. To address this, we propose a sentence-level Min-Max optimization framework for fine-grained noise filtering. First, a context-aware clue extractor identifies the answer-bearing sentence. Second, a relevance re-ranker is trained using feedback from a generative module to improve discriminative capability. Third, a differentiable truncation optimizer dynamically prunes redundant content by minimizing the number of essential clues required for correct answer generation. This work introduces the first sentence-level Min-Max noise filtering paradigm, enabling modular, multi-stage fine-tuning. Evaluated on three QA benchmarks, our method achieves up to 11.3% absolute accuracy gain on complex reasoning tasks and reduces inference cost by up to 42%, significantly outperforming state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract
Retrieved documents containing noise will hinder Retrieval-Augmented Generation (RAG) from detecting answer clues, necessitating noise filtering mechanisms to enhance accuracy.Existing methods use re-ranking or summarization to identify the most relevant sentences, but directly and accurately locating answer clues from these large-scale and complex documents remains challenging. Unlike these document-level operations, we treat noise filtering as a sentence-level MinMax optimization problem: first identifying the potential clues from multiple documents using contextual information, then ranking them by relevance, and finally retaining the least clues through truncation. In this paper, we propose FineFilter, a novel fine-grained noise filtering mechanism for RAG consisting of a clue extractor, a re-ranker, and a truncator. We optimize each module to tackle complex reasoning challenges: (1) Clue extractor firstly uses sentences containing the answer and similar ones as fine-tuned targets, aiming at extracting sufficient potential clues; (2) Re-ranker is trained to prioritize effective clues based on the real feedback from generation module, with clues capable of generating correct answer as positive samples and others as negative; (3) Truncator takes the minimum clues needed to answer the question (truncation point) as fine-tuned targets, and performs truncation on the re-ranked clues to achieve fine-grained noise filtering. Experiments on three QA datasets demonstrate that FineFilter significantly outperforms baselines in terms of performance and inference cost. Further analysis on each module shows the effectiveness of our optimizations for complex reasoning.
Problem

Research questions and friction points this paper is trying to address.

Enhance accuracy in Retrieval-Augmented Generation
Identify answer clues in complex documents
Optimize sentence-level noise filtering mechanisms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sentence-level MinMax optimization for noise filtering
Fine-grained filtering via clue extractor and re-ranker
Truncation based on minimal required answer clues
🔎 Similar Papers
No similar papers found.
Qianchi Zhang
Qianchi Zhang
Beihang University
HallucinationRAGLLM
Hainan Zhang
Hainan Zhang
Beihang University
Dialogue GenerationText GenerationFederated LearningNatural Language Processing
Liang Pang
Liang Pang
Associate Professor, Institute of Computing Technology, Chinese Academy of Sciences
Large Language ModelSemantic MatchingQuestion AnsweringText MatchingText Generation
Hongwei Zheng
Hongwei Zheng
Shanghai Jiao Tong University
čŽĄįŽ—æœēč§†č§‰ã€č”é‚Ļå­Ļäš 
Y
Yongxin Tong
Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, School of Artificial Intelligence, Beihang University, China
Z
Zhiming Zheng
Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, School of Artificial Intelligence, Beihang University, China