Beyond Prompting: An Efficient Embedding Framework for Open-Domain Question Answering

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

To address the high computational overhead, inference instability, and insufficient retrieval coverage in open-domain question answering (ODQA) caused by multi-turn prompt dependency in the retrieve-then-read paradigm, this paper proposes EmbQA—a unified embedding-level framework. Methodologically, EmbQA introduces three key innovations: (1) a lightweight contrastive learning–driven query embedding fine-tuning mechanism that operates without labeled supervision; (2) exploratory embedding expansion to enrich semantic coverage; and (3) an automatic filtering strategy based on answer confidence entropy. Extensive experiments across three open-source large language models, three mainstream retrievers, and four standard ODQA benchmarks demonstrate that EmbQA consistently achieves superior accuracy and inference efficiency, outperforming recent state-of-the-art baselines across all settings.

Technology Category

Application Category

📝 Abstract

Large language models have recently pushed open domain question answering (ODQA) to new frontiers. However, prevailing retriever-reader pipelines often depend on multiple rounds of prompt level instructions, leading to high computational overhead, instability, and suboptimal retrieval coverage. In this paper, we propose EmbQA, an embedding-level framework that alleviates these shortcomings by enhancing both the retriever and the reader. Specifically, we refine query representations via lightweight linear layers under an unsupervised contrastive learning objective, thereby reordering retrieved passages to highlight those most likely to contain correct answers. Additionally, we introduce an exploratory embedding that broadens the model's latent semantic space to diversify candidate generation and employs an entropy-based selection mechanism to choose the most confident answer automatically. Extensive experiments across three open-source LLMs, three retrieval methods, and four ODQA benchmarks demonstrate that EmbQA substantially outperforms recent baselines in both accuracy and efficiency.

Problem

Research questions and friction points this paper is trying to address.

Reduces computational overhead in open-domain question answering

Improves retrieval coverage and stability in ODQA pipelines

Enhances answer accuracy and efficiency using embedding-level framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhances retriever and reader via embedding-level framework

Refines query representations using unsupervised contrastive learning

Introduces exploratory embedding with entropy-based answer selection

🔎 Similar Papers

No similar papers found.