Precise Information Control in Long-Form Text Generation

๐Ÿ“… 2025-06-06
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the inherent hallucination problem in large language models (LLMs)โ€”i.e., generating plausible yet unsupported content in long-text generationโ€”this paper introduces the Precise Information Control (PIC) task: models must generate extended text *exclusively* from a set of verifiable, self-contained short statements, prohibiting any information not entailed by the statements. Our contributions are threefold: (1) the first formalization of PIC under full and partial statement coverage; (2) the construction of PIC-Bench, the first benchmark spanning eight diverse downstream tasks; and (3) a weakly supervised, preference-data-driven post-training framework integrating statement grounding, F1-oriented precision evaluation, and instruction tuning. Experiments show that PIC-LM achieves 91.0% F1 (+21.9%) under full PIC settings, improves precise recall in ambiguous question answering by 17.1%, and boosts factual accuracy in birthplace verification by 30.5%.

Technology Category

Application Category

๐Ÿ“ Abstract
A central challenge in modern language models (LMs) is intrinsic hallucination: the generation of information that is plausible but unsubstantiated relative to input context. To study this problem, we propose Precise Information Control (PIC), a new task formulation that requires models to generate long-form outputs grounded in a provided set of short self-contained statements, known as verifiable claims, without adding any unsupported ones. For comprehensiveness, PIC includes a full setting that tests a model's ability to include exactly all input claims, and a partial setting that requires the model to selectively incorporate only relevant claims. We present PIC-Bench, a benchmark of eight long-form generation tasks (e.g., summarization, biography generation) adapted to the PIC setting, where LMs are supplied with well-formed, verifiable input claims. Our evaluation of a range of open and proprietary LMs on PIC-Bench reveals that, surprisingly, state-of-the-art LMs still intrinsically hallucinate in over 70% of outputs. To alleviate this lack of faithfulness, we introduce a post-training framework, using a weakly supervised preference data construction method, to train an 8B PIC-LM with stronger PIC ability--improving from 69.1% to 91.0% F1 in the full PIC setting. When integrated into end-to-end factual generation pipelines, PIC-LM improves exact match recall by 17.1% on ambiguous QA with retrieval, and factual precision by 30.5% on a birthplace verification task, underscoring the potential of precisely grounded generation.
Problem

Research questions and friction points this paper is trying to address.

Control hallucination in long-form text generation
Generate outputs grounded in verifiable claims
Improve faithfulness in factual generation pipelines
Innovation

Methods, ideas, or system contributions that make the work stand out.

Precise Information Control task formulation
PIC-Bench benchmark for long-form generation
Weakly supervised post-training framework
๐Ÿ”Ž Similar Papers
No similar papers found.