Re-examining learning linear functions in context

📅 2024-11-18

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work investigates how large language models (LLMs) acquire and generalize simple mathematical relations—specifically univariate linear functions—via in-context learning (ICL). Method: Using from-scratch trained GPT-2–style Transformer models and rigorously controlled synthetic data, we systematically evaluate in-distribution generalization and out-of-distribution (OOD) induction. Contribution/Results: We find that models fail to abstract the underlying linear rule and do not perform implicit algorithmic reasoning (e.g., linear regression); instead, they capture only superficial statistical regularities. Critically, they exhibit complete failure on OOD test cases, confirming that ICL operates via memory-driven, local pattern matching rather than genuine functional induction. We propose the first mathematically precise, empirically verifiable hypothesis challenging dominant algorithmic-reasoning accounts of ICL. Our study establishes a novel, rigorous evaluation paradigm for generalization on mathematical tasks, providing both theoretical grounding and an empirical framework for characterizing fundamental limitations of LLM reasoning.

Technology Category

Application Category

📝 Abstract

In-context learning (ICL) has emerged as a powerful paradigm for easily adapting Large Language Models (LLMs) to various tasks. However, our understanding of how ICL works remains limited. We explore a simple model of ICL in a controlled setup with synthetic training data to investigate ICL of univariate linear functions. We experiment with a range of GPT-2-like transformer models trained from scratch. Our findings challenge the prevailing narrative that transformers adopt algorithmic approaches like linear regression to learn a linear function in-context. These models fail to generalize beyond their training distribution, highlighting fundamental limitations in their capacity to infer abstract task structures. Our experiments lead us to propose a mathematically precise hypothesis of what the model might be learning.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Mathematical Relationships

Abstract Understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Mathematical Concept Learning

Generalization Limitations

🔎 Similar Papers

Loss Landscape Degeneracy Drives Stagewise Development in Transformers