LAQuer: Localized Attribution Queries in Content-grounded Generation

📅 2025-06-01

📈 Citations: 0

✨ Influential: 0

career value

141K/year

🤖 AI Summary

Current text generation models frequently produce factually inconsistent outputs, while existing attribution methods suffer from either coarse granularity (e.g., sentence-level) or misalignment with user needs (e.g., clause-level attributions lack controllability), resulting in low verification efficiency. To address this, we propose Localized Attribution Queries (LAQuer), a novel task enabling users to select arbitrary generated spans and precisely identify their corresponding source text spans. We introduce a user-centric, fine-grained attribution modeling framework that jointly leverages large language model prompting and internal representation probing to achieve, for the first time, span-level alignment. We design a dedicated evaluation paradigm and release the first LAQuer benchmark. Experiments on multi-document summarization and long-form question answering demonstrate that our method significantly reduces attribution length, enhances interpretability, and improves human verification efficiency—advancing research on local traceability in grounded content generation.

Technology Category

Application Category

📝 Abstract

Grounded text generation models often produce content that deviates from their source material, requiring user verification to ensure accuracy. Existing attribution methods associate entire sentences with source documents, which can be overwhelming for users seeking to fact-check specific claims. In contrast, existing sub-sentence attribution methods may be more precise but fail to align with users' interests. In light of these limitations, we introduce Localized Attribution Queries (LAQuer), a new task that localizes selected spans of generated output to their corresponding source spans, allowing fine-grained and user-directed attribution. We compare two approaches for the LAQuer task, including prompting large language models (LLMs) and leveraging LLM internal representations. We then explore a modeling framework that extends existing attributed text generation methods to LAQuer. We evaluate this framework across two grounded text generation tasks: Multi-document Summarization (MDS) and Long-form Question Answering (LFQA). Our findings show that LAQuer methods significantly reduce the length of the attributed text. Our contributions include: (1) proposing the LAQuer task to enhance attribution usability, (2) suggesting a modeling framework and benchmarking multiple baselines, and (3) proposing a new evaluation setting to promote future research on localized attribution in content-grounded generation.

Problem

Research questions and friction points this paper is trying to address.

Grounded text generation models produce inaccurate content requiring verification

Existing attribution methods are too broad or misaligned with user needs

LAQuer enables fine-grained source attribution for specific generated spans

Innovation

Methods, ideas, or system contributions that make the work stand out.

Localizes generated spans to source spans

Uses LLM prompting and internal representations

Extends attributed text generation methods

🔎 Similar Papers

Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation

2024-07-01North American Chapter of the Association for Computational LinguisticsCitations: 7

💼 Related Jobs

PhD GenAI Research Scientist Intern

Databricks

SF Bay Area Hourly Rate$54—$60 USD

San Francisco, CA, USA

Authors to Follow