ICLR: In-Context Learning of Representations

📅 2024-12-29

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This work investigates whether large language models (LLMs) can dynamically reconstruct lexical representations via in-context examples to align with task-defined semantics (e.g., graph traversal), rather than relying solely on pretrained semantic priors. We design a graph-tracing task grounded in Conceptual Role Semantics, and employ intermediate-layer activation analysis alongside controlled semantic relatedness experiments. Our key findings—first empirically demonstrated—are: (1) increasing context size triggers a phase-transition-like reorganization of internal representations; (2) with sufficient context, model representations significantly align with the manually specified graph structure; and (3) high semantic relatedness between concepts attenuates this alignment, revealing competition between pretrained and contextual semantics. Based on these results, we propose an implicit energy minimization mechanism to account for context-driven semantic override.

Technology Category

Application Category

📝 Abstract

Recent work has demonstrated that semantics specified by pretraining data influence how representations of different concepts are organized in a large language model (LLM). However, given the open-ended nature of LLMs, e.g., their ability to in-context learn, we can ask whether models alter these pretraining semantics to adopt alternative, context-specified ones. Specifically, if we provide in-context exemplars wherein a concept plays a different role than what the pretraining data suggests, do models reorganize their representations in accordance with these novel semantics? To answer this question, we take inspiration from the theory of conceptual role semantics and define a toy"graph tracing"task wherein the nodes of the graph are referenced via concepts seen during training (e.g., apple, bird, etc.) and the connectivity of the graph is defined via some predefined structure (e.g., a square grid). Given exemplars that indicate traces of random walks on the graph, we analyze intermediate representations of the model and find that as the amount of context is scaled, there is a sudden re-organization from pretrained semantic representations to in-context representations aligned with the graph structure. Further, we find that when reference concepts have correlations in their semantics (e.g., Monday, Tuesday, etc.), the context-specified graph structure is still present in the representations, but is unable to dominate the pretrained structure. To explain these results, we analogize our task to energy minimization for a predefined graph topology, providing evidence towards an implicit optimization process to infer context-specified semantics. Overall, our findings indicate scaling context-size can flexibly re-organize model representations, possibly unlocking novel capabilities.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Contextual Semantic Adaptation

Vocabulary Understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contextual Learning

Adaptive Semantics

Dynamic Language Understanding

🔎 Similar Papers

Chrono: A Simple Blueprint for Representing Time in MLLMs