🤖 AI Summary
To address the challenge of cross-paragraph evidence retrieval in multi-hop question answering, this paper proposes a hybrid retrieval method integrating lexical matching with dense re-ranking, and designs a labeled query iterative optimization mechanism within a RAG framework to enhance reasoning accuracy, efficiency, and interpretability in zero-shot settings. Technically, it innovatively incorporates maximal marginal relevance (MMR) and lexical overlap signals into dense re-ranking, and adapts the EfficientRAG pipeline to enable fine-tuning-free query evolution. On the HotpotQA benchmark, the method achieves 50% and 47% relative improvements in exact match and F1 scores over the standard cosine-similarity baseline, respectively. It also significantly boosts entity recall and evidence complementarity, demonstrating its effectiveness for complex multi-hop reasoning tasks.
📝 Abstract
Transformer-based models have advanced the field of question answering, but multi-hop reasoning, where answers require combining evidence across multiple passages, remains difficult. This paper presents a comprehensive evaluation of retrieval strategies for multi-hop question answering within a retrieval-augmented generation framework. We compare cosine similarity, maximal marginal relevance, and a hybrid method that integrates dense embeddings with lexical overlap and re-ranking. To further improve retrieval, we adapt the EfficientRAG pipeline for query optimization, introducing token labeling and iterative refinement while maintaining efficiency. Experiments on the HotpotQA dataset show that the hybrid approach substantially outperforms baseline methods, achieving a relative improvement of 50 percent in exact match and 47 percent in F1 score compared to cosine similarity. Error analysis reveals that hybrid retrieval improves entity recall and evidence complementarity, while remaining limited in handling distractors and temporal reasoning. Overall, the results suggest that hybrid retrieval-augmented generation provides a practical zero-shot solution for multi-hop question answering, balancing accuracy, efficiency, and interpretability.