ReFactX: Scalable Reasoning with Reliable Facts via Constrained Generation

๐Ÿ“… 2025-08-23
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address hallucination in large language models (LLMs) caused by factual knowledge gaps, this paper proposes a lightweight, retrieval-free constrained generation method that requires no external retriever or auxiliary model. The core innovation lies in textualizing knowledge graph (KG) triples and constructing a Byte-Pair Encoding (BPE) prefix tree index over them, enabling real-time, token-level output constraints during autoregressive decoding to ensure only verified KG facts are generated. Unlike retrieval-augmented generation (RAG) or tool-augmented approaches, our method eliminates pipeline complexity, error propagation, and high token overhead. It scales efficiently to KGs containing up to 800 million facts. Experiments on open-domain question answering demonstrate substantial accuracy gains over state-of-the-art RAG and tool-augmented baselines, with low latency, strong scalability, and robust cross-domain adaptability.

Technology Category

Application Category

๐Ÿ“ Abstract
Knowledge gaps and hallucinations are persistent challenges for Large Language Models (LLMs), which generate unreliable responses when lacking the necessary information to fulfill user instructions. Existing approaches, such as Retrieval-Augmented Generation (RAG) and tool use, aim to address these issues by incorporating external knowledge. Yet, they rely on additional models or services, resulting in complex pipelines, potential error propagation, and often requiring the model to process a large number of tokens. In this paper, we present a scalable method that enables LLMs to access external knowledge without depending on retrievers or auxiliary models. Our approach uses constrained generation with a pre-built prefix-tree index. Triples from a Knowledge Graph are verbalized in textual facts, tokenized, and indexed in a prefix tree for efficient access. During inference, to acquire external knowledge, the LLM generates facts with constrained generation which allows only sequences of tokens that form an existing fact. We evaluate our proposal on Question Answering and show that it scales to large knowledge bases (800 million facts), adapts to domain-specific data, and achieves effective results. These gains come with minimal generation-time overhead. ReFactX code is available at https://github.com/rpo19/ReFactX.
Problem

Research questions and friction points this paper is trying to address.

Addresses knowledge gaps and hallucinations in LLMs
Enables external knowledge access without retrievers
Scales constrained generation to large knowledge bases
Innovation

Methods, ideas, or system contributions that make the work stand out.

Constrained generation with prefix-tree index
Access external knowledge without retrievers
Scales to large knowledge bases efficiently