Jointly Generating and Attributing Answers using Logits of Document-Identifier Tokens

📅 2025-08-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) frequently generate hallucinated outputs, undermining answer reliability. Existing attribution methods struggle to achieve real-time, tight alignment between answer generation and document provenance in retrieval-augmented generation (RAG), often introducing significant latency. To address this, we propose LoDIT—a novel framework that jointly models document identifiers (Doc-IDs) and token-level logits to enable simultaneous answer generation and fine-grained document attribution. LoDIT integrates Doc-ID tokenization, logits-driven contribution estimation, and dynamic aggregation, enabling on-the-fly quantification of each retrieved document’s contribution during decoding—thereby balancing faithfulness and inference efficiency. Evaluated on the Trust-Align benchmark, LoDIT significantly outperforms state-of-the-art methods in attribution accuracy and answer fidelity, while reducing end-to-end latency and demonstrating strong robustness across diverse RAG configurations.

Technology Category

Application Category

📝 Abstract
Despite their impressive performances, Large Language Models (LLMs) remain prone to hallucination, which critically undermines their trustworthiness. While most of the previous work focused on tackling answer and attribution correctness, a recent line of work investigated faithfulness, with a focus on leveraging internal model signals to reflect a model's actual decision-making process while generating the answer. Nevertheless, these methods induce additional latency and have shown limitations in directly aligning token generation with attribution generation. In this paper, we introduce LoDIT, a method that jointly generates and faithfully attributes answers in RAG by leveraging specific token logits during generation. It consists of two steps: (1) marking the documents with specific token identifiers and then leveraging the logits of these tokens to estimate the contribution of each document to the answer during generation, and (2) aggregating these contributions into document attributions. Experiments on a trustworthiness-focused attributed text-generation benchmark, Trust-Align, show that LoDIT significantly outperforms state-of-the-art models on several metrics. Finally, an in-depth analysis of LoDIT shows both its efficiency in terms of latency and its robustness in different settings.
Problem

Research questions and friction points this paper is trying to address.

Reduce LLM hallucination to enhance trustworthiness
Align token generation with attribution generation
Improve efficiency and robustness in RAG systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses document-identifier tokens for attribution
Leverages token logits for joint generation
Aggregates document contributions efficiently
🔎 Similar Papers
No similar papers found.
L
Lucas Albarede
Université de Toulouse, IRIT, Toulouse, France
J
Jose Moreno
Université de Toulouse, IRIT, Toulouse, France
Lynda Tamine
Lynda Tamine
Professor in computer science, University of Toulouse, IRIT lab. , France
Information retrieval
L
Luce Lefeuvre
Dir. Technologies Innovation, SNCF, Paris, France