Is In-Context Universality Enough? MLPs are Also Universal In-Context

📅 2025-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The prevailing attribution that Transformers outperform traditional models due to their inherent in-context learning (ICL) universality is challenged. Method: The authors rigorously prove—via continuous function approximation theory on probability measure spaces, combined with differentiable activation parameterization and functional analysis—that multilayer perceptrons (MLPs) with trainable activation functions possess ICL universality: they can approximate any context-query mapping to arbitrary precision. Contribution/Results: This result demonstrates that ICL universality is not exclusive to the Transformer architecture, thereby weakening the strong hypothesis that architectural universality alone determines ICL success. Instead, the study shifts focus toward the critical roles of inductive bias and optimization stability in enabling effective ICL in large language models. The theoretical framework provides a foundation for analyzing ICL beyond attention-based architectures and highlights the need to decouple architectural features from emergent learning capabilities.

Technology Category

Application Category

📝 Abstract
The success of transformers is often linked to their ability to perform in-context learning. Recent work shows that transformers are universal in context, capable of approximating any real-valued continuous function of a context (a probability measure over $mathcal{X}subseteq mathbb{R}^d$) and a query $xin mathcal{X}$. This raises the question: Does in-context universality explain their advantage over classical models? We answer this in the negative by proving that MLPs with trainable activation functions are also universal in-context. This suggests the transformer's success is likely due to other factors like inductive bias or training stability.
Problem

Research questions and friction points this paper is trying to address.

In-context universality of transformers
Comparison with MLPs universality
Exploring transformer's success factors
Innovation

Methods, ideas, or system contributions that make the work stand out.

MLPs with trainable activation functions
universal in-context learning
challenges transformer superiority
🔎 Similar Papers
No similar papers found.