CHOP: Chunkwise Context-Preserving Framework for RAG on Multi Documents

πŸ“… 2026-04-17
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

169K/year
πŸ€– AI Summary
This work addresses the challenges of retrieval confusion, information redundancy, and factual inaccuracies in existing Retrieval-Augmented Generation (RAG) systems when handling semantically similar documents in multi-document settings. To mitigate these issues, the authors propose a context-aware retrieval enhancement approach that generates compact textual chunk signatures via a CNM-Extractor and incorporates a continuity decision module. This module leverages large language models to assess thematic coherence and structural continuity between chunks, thereby enriching each chunk with context-aware metadata. The proposed method effectively alleviates semantic conflicts among similar documents, achieving a Top-1 hit rate of 90.77% on standard benchmarks and significantly improving retrieval ranking quality, thus offering a novel paradigm for constructing high-fidelity knowledge bases.

Technology Category

Application Category

πŸ“ Abstract
Retrieval-Augmented Generation (RAG) systems lose retrieval accuracy when similar documents coexist in the vector database, causing unnecessary information, hallucinations, and factual errors. To alleviate this issue, we propose CHOP, a framework that iteratively evaluates chunk relevance with Large Language Models (LLMs) and progressively reconstructs documents by determining their association with specific topics or query types. CHOP integrates two key components: the CNM-Extractor, which generates compact per-chunk signatures capturing categories, key nouns, and model names, and the Continuity Decision Module, which preserves contextual coherence by deciding whether consecutive chunks belong to the same document flow. By prefixing each chunk with context-aware metadata, CHOP reduces semantic conflicts among similar documents and enhances retriever discrimination. Experiments on benchmark datasets show that CHOP alleviates retrieval confusion and provides a scalable approach for building high-quality knowledge bases, achieving a Top-1 Hit Rate of 90.77% and notable gains in ranking quality metrics.
Problem

Research questions and friction points this paper is trying to address.

Retrieval-Augmented Generation
multi-document retrieval
retrieval accuracy
semantic conflicts
factual errors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chunkwise Context-Preserving
Retrieval-Augmented Generation
CNM-Extractor
Continuity Decision Module
Context-Aware Metadata
πŸ”Ž Similar Papers
No similar papers found.