Small Encoders Can Rival Large Decoders in Detecting Groundedness

📅 2025-06-26

📈 Citations: 0

✨ Influential: 0

career value

129K/year

🤖 AI Summary

Large language models (LLMs) frequently generate hallucinations when external context is insufficient, relying instead on internal knowledge for unsupported reasoning. To address this, we propose a lightweight grounding detection mechanism that efficiently determines—prior to answer generation—whether a query is sufficiently supported by the given document. Our method employs compact encoder models (e.g., RoBERTa and NomicBERT), fine-tuned on a domain-adapted, human-annotated binary grounding dataset. Experimental results demonstrate that our detector achieves accuracy comparable to state-of-the-art LLMs—including Llama3-8B and GPT-4o—while reducing inference latency by one to two orders of magnitude and significantly lowering computational cost. The implementation is publicly available, offering an efficient, pre-generation verification paradigm to enhance LLM reliability in production deployments.

Technology Category

Application Category

📝 Abstract

Augmenting large language models (LLMs) with external context significantly improves their performance in natural language processing (NLP) tasks. However, LLMs struggle to answer queries reliably when the provided context lacks information, often resorting to ungrounded speculation or internal knowledge. Groundedness - generating responses strictly supported by the context - is essential for ensuring factual consistency and trustworthiness. This study focuses on detecting whether a given query is grounded in a document provided in context before the costly answer generation by LLMs. Such a detection mechanism can significantly reduce both inference time and resource consumption. We show that lightweight, task specific encoder models such as RoBERTa and NomicBERT, fine-tuned on curated datasets, can achieve accuracy comparable to state-of-the-art LLMs, such as Llama3 8B and GPT4o, in groundedness detection while reducing inference latency by orders of magnitude. The code is available at : https://github.com/chandarlab/Hallucinate-less

Problem

Research questions and friction points this paper is trying to address.

Detect if query is grounded in provided context

Reduce LLM inference time and resource use

Lightweight encoders match LLM accuracy efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight encoders rival large decoders

Fine-tuned RoBERTa and NomicBERT for detection

Reduce latency while maintaining high accuracy

🔎 Similar Papers

Latent Point Collapse on a Low Dimensional Embedding in Deep Neural Network Classifiers