Multiple Streams of Relation Extraction: Enriching and Recalling in Transformers

📅 2025-06-25

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

This study investigates how relation knowledge (e.g., event and entity associations) is stored and retrieved in fine-tuned large language models (LLMs). Addressing the limitation of existing localization methods—such as activation patching—that disrupt residual stream integrity, we propose Dynamic Weight Grafting, the first technique to isolate and empirically validate two distinct information pathways: a “rich” pathway (injecting entity-level representations during early processing) and a “recall” pathway (dynamically extracting relational knowledge from deeper layers prior to prediction). Integrating activation analysis with component-level tracking, we precisely localize task-specific relational extraction steps within attention and feed-forward modules. Results show that final-layer attention and feed-forward subnetworks predominantly govern relation recall; moreover, task-specific reliance on the dual pathways varies—some tasks require both, while others succeed via a single pathway. Our work reveals a staged processing paradigm for relational knowledge within the residual stream and provides empirical evidence for both the functional necessity and structural redundancy of these information pathways.

Technology Category

Application Category

📝 Abstract

When an LLM learns a relation during finetuning (e.g., new movie releases, corporate mergers, etc.), where does this information go? Is it extracted when the model processes an entity, recalled just-in-time before a prediction, or are there multiple separate heuristics? Existing localization approaches (e.g. activation patching) are ill-suited for this analysis because they tend to replace parts of the residual stream, potentially deleting information. To fill this gap, we propose dynamic weight-grafting between fine-tuned and pre-trained language models to show that fine-tuned language models both (1) extract relation information learned during finetuning while processing entities and (2) ``recall" this information in later layers while generating predictions. In some cases, models need both of these pathways to correctly generate finetuned information while, in other cases, a single ``enrichment" or ``recall" pathway alone is sufficient. We examine the necessity and sufficiency of these information pathways, examining what layers they occur at, how much redundancy they exhibit, and which model components are involved -- finding that the ``recall" pathway occurs via both task-specific attention mechanisms and a relation extraction step in the output of the attention and the feedforward networks at the final layers before next token prediction.

Problem

Research questions and friction points this paper is trying to address.

How LLMs store and recall relation information during finetuning

Analyze enrichment and recall pathways in transformer layers

Identify model components involved in relation extraction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic weight-grafting between fine-tuned and pre-trained models

Extracting relation information during entity processing

Recalling information via task-specific attention mechanisms

🔎 Similar Papers

ReLiK: Retrieve and LinK, Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget