A Comprehensive Evaluation of Transformer-Based Question Answering Models and RAG-Enhanced Design

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

To address the challenge of cross-paragraph evidence retrieval in multi-hop question answering, this paper proposes a hybrid retrieval method integrating lexical matching with dense re-ranking, and designs a labeled query iterative optimization mechanism within a RAG framework to enhance reasoning accuracy, efficiency, and interpretability in zero-shot settings. Technically, it innovatively incorporates maximal marginal relevance (MMR) and lexical overlap signals into dense re-ranking, and adapts the EfficientRAG pipeline to enable fine-tuning-free query evolution. On the HotpotQA benchmark, the method achieves 50% and 47% relative improvements in exact match and F1 scores over the standard cosine-similarity baseline, respectively. It also significantly boosts entity recall and evidence complementarity, demonstrating its effectiveness for complex multi-hop reasoning tasks.

Technology Category

Application Category

📝 Abstract

Transformer-based models have advanced the field of question answering, but multi-hop reasoning, where answers require combining evidence across multiple passages, remains difficult. This paper presents a comprehensive evaluation of retrieval strategies for multi-hop question answering within a retrieval-augmented generation framework. We compare cosine similarity, maximal marginal relevance, and a hybrid method that integrates dense embeddings with lexical overlap and re-ranking. To further improve retrieval, we adapt the EfficientRAG pipeline for query optimization, introducing token labeling and iterative refinement while maintaining efficiency. Experiments on the HotpotQA dataset show that the hybrid approach substantially outperforms baseline methods, achieving a relative improvement of 50 percent in exact match and 47 percent in F1 score compared to cosine similarity. Error analysis reveals that hybrid retrieval improves entity recall and evidence complementarity, while remaining limited in handling distractors and temporal reasoning. Overall, the results suggest that hybrid retrieval-augmented generation provides a practical zero-shot solution for multi-hop question answering, balancing accuracy, efficiency, and interpretability.

Problem

Research questions and friction points this paper is trying to address.

Evaluating retrieval strategies for multi-hop question answering

Improving retrieval through hybrid methods and query optimization

Assessing performance on HotpotQA with accuracy and efficiency trade-offs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid retrieval combining dense embeddings with lexical overlap

Adapted EfficientRAG pipeline with token labeling

Iterative query refinement maintaining efficiency

🔎 Similar Papers

Large Language Model Enhanced Knowledge Representation Learning: A Survey