🤖 AI Summary
To address inaccurate retrieval and frequent hallucinations in Retrieval-Augmented Generation (RAG) for multi-choice question answering (MCQA) in the telecommunications domain, this paper proposes a first-token-probability-guided dynamic RAG framework. The method leverages the probability distribution over the first token of the large language model’s (LLM) answer as a lightweight, interpretable confidence signal to dynamically adjust key RAG parameters—including retrieved passage count, sliding window size, and context composition—enabling online hyperparameter tuning and adaptive context reconstruction. To our knowledge, this is the first work to employ first-token probability as a real-time control signal for RAG optimization. Evaluated on a telecom-specific MCQA benchmark, our approach achieves significant improvements in answer accuracy while substantially reducing hallucination rates, demonstrating that probability-driven adaptation consistently enhances the robustness and reliability of domain-specific RAG systems.
📝 Abstract
Large Language Models (LLMs) have garnered significant attention for their impressive general-purpose capabilities. For applications requiring intricate domain knowledge, Retrieval-Augmented Generation (RAG) has shown a distinct advantage in incorporating domain-specific information into LLMs. However, existing RAG research has not fully addressed the challenges of Multiple Choice Question Answering (MCQA) in telecommunications, particularly in terms of retrieval quality and mitigating hallucinations. To tackle these challenges, we propose a novel first token probability guided RAG framework. This framework leverages confidence scores to optimize key hyperparameters, such as chunk number and chunk window size, while dynamically adjusting the context. Our method starts by retrieving the most relevant chunks and generates a single token as the potential answer. The probabilities of all options are then normalized to serve as confidence scores, which guide the dynamic adjustment of the context. By iteratively optimizing the hyperparameters based on these confidence scores, we can continuously improve RAG performance. We conducted experiments to validate the effectiveness of our framework, demonstrating its potential to enhance accuracy in domain-specific MCQA tasks.