Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries

📅 2025-02-27

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This work investigates the knowledge recall and deduplication coordination mechanisms in large language models (LLMs) when answering one-to-many factual queries (e.g., “List cities in a given country”). Methodologically, it integrates early-decoding analysis, causal tracing, attention-aggregated decoding (Token Lens), and MLP-output ablation experiments. The study reveals, for the first time, a two-phase dynamic mechanism—“promote-then-suppress”: during initial layers, attention propagates subject information and retrieves candidate answers, while MLPs drive generation; subsequently, attention actively suppresses already-generated tokens, and MLPs reinforce this suppression signal. The authors introduce Token Lens and attention ablation as novel tools for token-level attribution. Extensive validation across diverse LLMs and datasets confirms the mechanism’s generality. This work establishes a unified, interpretable framework for understanding internal coordination in complex factual recall.

Technology Category

Application Category

📝 Abstract

To answer one-to-many factual queries (e.g., listing cities of a country), a language model (LM) must simultaneously recall knowledge and avoid repeating previous answers. How are these two subtasks implemented and integrated internally? Across multiple datasets and models, we identify a promote-then-suppress mechanism: the model first recalls all answers, and then suppresses previously generated ones. Specifically, LMs use both the subject and previous answer tokens to perform knowledge recall, with attention propagating subject information and MLPs promoting the answers. Then, attention attends to and suppresses previous answer tokens, while MLPs amplify the suppression signal. Our mechanism is corroborated by extensive experimental evidence: in addition to using early decoding and causal tracing, we analyze how components use different tokens by introducing both emph{Token Lens}, which decodes aggregated attention updates from specified tokens, and a knockout method that analyzes changes in MLP outputs after removing attention to specified tokens. Overall, we provide new insights into how LMs' internal components interact with different input tokens to support complex factual recall. Code is available at https://github.com/Lorenayannnnn/how-lms-answer-one-to-many-factual-queries.

Problem

Research questions and friction points this paper is trying to address.

How LMs recall and avoid repeating answers in one-to-many queries.

Mechanism: promote-then-suppress for knowledge recall and suppression.

Analyzing LM components' interaction with tokens for factual recall.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Promote-then-suppress mechanism for factual queries

Token Lens decodes aggregated attention updates

Knockout method analyzes MLP output changes

🔎 Similar Papers

FacLens: Transferable Probe for Foreseeing Non-Factuality in Large Language Models