🤖 AI Summary
RAG performance is fundamentally constrained by the trade-off between retrieval context size and relevance: larger contexts introduce noise, while smaller ones risk omitting critical information—especially for low-information-density, complex queries. To address this, we propose a forward–backward dual-path retrieval mechanism: the forward path retrieves candidate answers or reasoning chains, while the backward path reconstructs and matches the original query; both paths jointly filter highly relevant text fragments. We introduce the first hybrid retrieval framework integrating semantic re-ranking, bidirectional attention matching, and dynamic context pruning—fully compatible with mainstream embedding models and LLMs. Evaluated on nine benchmark datasets, our method achieves a 3.2% average accuracy gain over standard RAG and long-context baselines, while reducing end-to-end latency by 17%. It significantly improves retrieval precision and reasoning efficiency for complex, information-sparse queries.
📝 Abstract
The performance of Retrieval Augmented Generation (RAG) systems relies heavily on the retriever quality and the size of the retrieved context. A large enough context ensures that the relevant information is present in the input context for the LLM, but also incorporates irrelevant content that has been shown to confuse the models. On the other hand, a smaller context reduces the irrelevant information, but it often comes at the risk of losing important information necessary to answer the input question. This duality is especially challenging to manage for complex queries that contain little information to retrieve the relevant chunks from the full context. To address this, we present a novel framework, called FB-RAG, which enhances the RAG pipeline by relying on a combination of backward lookup (overlap with the query) and forward lookup (overlap with candidate reasons and answers) to retrieve specific context chunks that are the most relevant for answering the input query. Our evaluations on 9 datasets from two leading benchmarks show that FB-RAG consistently outperforms RAG and Long Context baselines developed recently for these benchmarks. We further show that FB-RAG can improve performance while reducing latency. We perform qualitative analysis of the strengths and shortcomings of our approach, providing specific insights to guide future work.