In-Context Algorithm Emulation in Fixed-Weight Transformers

📅 2025-08-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates whether a frozen-weight, minimal Transformer—comprising only two layers and a single attention head, with no feed-forward networks or parameter updates—can precisely simulate diverse algorithms solely via in-context prompting. Method: We propose a prompting mechanism grounded entirely in attention: algorithmic parameters are encoded into input prompts, and the softmax sharpening property of dot-product attention is leveraged to govern information flow, enabling the fixed-weight model to reproduce target algorithmic outputs. Contribution/Results: We provide a theoretical proof that any algorithm implementable by a fixed-weight attention head can be approximated to arbitrary precision via appropriately constructed prompts. This establishes, for the first time, a rigorous formal connection between in-context learning and algorithmic simulation, revealing how large language models dynamically “switch” between algorithms through prompting. Our results furnish a formal foundation for viewing Transformers as programmable algorithm libraries.

Technology Category

Application Category

📝 Abstract
We prove that a minimal Transformer architecture with frozen weights is capable of emulating a broad class of algorithms by in-context prompting. In particular, for any algorithm implementable by a fixed-weight attention head (e.g. one-step gradient descent or linear/ridge regression), there exists a prompt that drives a two-layer softmax attention module to reproduce the algorithm's output with arbitrary precision. This guarantee extends even to a single-head attention layer (using longer prompts if necessary), achieving architectural minimality. Our key idea is to construct prompts that encode an algorithm's parameters into token representations, creating sharp dot-product gaps that force the softmax attention to follow the intended computation. This construction requires no feed-forward layers and no parameter updates. All adaptation happens through the prompt alone. These findings forge a direct link between in-context learning and algorithmic emulation, and offer a simple mechanism for large Transformers to serve as prompt-programmable libraries of algorithms. They illuminate how GPT-style foundation models may swap algorithms via prompts alone, establishing a form of algorithmic universality in modern Transformer models.
Problem

Research questions and friction points this paper is trying to address.

Emulating algorithms with frozen-weight Transformers through prompting
Achieving algorithmic universality without parameter updates or feed-forward layers
Enabling GPT models to swap algorithms via in-context prompts alone
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fixed-weight Transformers emulate algorithms via prompts
Prompt encoding forces softmax attention computation
No feed-forward layers or parameter updates required
🔎 Similar Papers
No similar papers found.
Jerry Yao-Chieh Hu
Jerry Yao-Chieh Hu
Northwestern University
Machine Learning(* denotes equal contribution)
H
Hude Liu
Department of Computer Science, Northwestern University, Evanston, IL 60208, USA
J
Jennifer Yuntong Zhang
Engineering Science, University of Toronto, Toronto, ON M5S 1A4, CA
H
Han Liu
Center for Foundation Models and Generative AI, Northwestern University, Evanston, IL 60208, USA; Department of Computer Science, Northwestern University, Evanston, IL 60208, USA; Department of Statistics and Data Science, Northwestern University, Evanston, IL 60208, USA