🤖 AI Summary
Existing RAG models rely on single-granularity text segmentation, limiting their ability to satisfy multi-level abstraction requirements in information retrieval; this often leads to critical “mid-level” information loss and context-length overflow. To address these issues, we propose Multi-level Abstraction Retrieval (MAL), the first framework enabling joint retrieval and fusion of text chunks across four granularities—sentences, paragraphs, sections, and full documents. MAL integrates hierarchy-aware retrieval, hierarchical chunking, and context-aware answer generation. Evaluated on a question-answering task in glycobiology, MAL achieves a 25.739% improvement in AI-assessed answer correctness over single-granularity RAG baselines. It effectively mitigates granularity mismatch and token budget constraints, establishing a novel paradigm for cross-abstraction-level knowledge retrieval and generation.
📝 Abstract
A Retrieval-Augmented Generation (RAG) model powered by a large language model (LLM) provides a faster and more cost-effective solution for adapting to new data and knowledge. It also delivers more specialized responses compared to pre-trained LLMs. However, most existing approaches rely on retrieving prefix-sized chunks as references to support question-answering (Q/A). This approach is often deployed to address information needs at a single level of abstraction, as it struggles to generate answers across multiple levels of abstraction. In an RAG setting, while LLMs can summarize and answer questions effectively when provided with sufficient details, retrieving excessive information often leads to the 'lost in the middle' problem and exceeds token limitations. We propose a novel RAG approach that uses chunks of multiple abstraction levels (MAL), including multi-sentence-level, paragraph-level, section-level, and document-level. The effectiveness of our approach is demonstrated in an under-explored scientific domain of Glycoscience. Compared to traditional single-level RAG approaches, our approach improves AI evaluated answer correctness of Q/A by 25.739% on Glyco-related papers.