🤖 AI Summary
This paper investigates the practical feasibility of language generation in terms of sample efficiency. It establishes that, for classical language classes—including regular and context-free languages—successful generation may require a number of positive examples exceeding the bound of any computable function, rendering sample complexity uncomputable—even though these classes are theoretically learnable.
Method: Integrating formal language theory, the PAC learning framework, and the Kleinberg–Mullainathan generative model, the paper rigorously derives and proves strong information-theoretic lower bounds on sample complexity.
Contribution/Results: The work provides the first systematic characterization, from a computational complexity perspective, of fundamental sample barriers inherent to language generation. Crucially, it demonstrates that the empirical success of modern large language models cannot be fully explained by classical learnability theory alone; instead, it must rely on structural constraints unique to natural language. This insight offers a novel conceptual bridge between theoretical guarantees and practical performance.
📝 Abstract
Kleinberg and Mullainathan showed that, in principle, language generation is always possible: with sufficiently many positive examples, a learner can eventually produce sentences indistinguishable from those of a target language. However, the existence of such a guarantee does not speak to its practical feasibility. In this work, we show that even for simple and well-studied language families -- such as regular and context-free languages -- the number of examples required for successful generation can be extraordinarily large, and in some cases not bounded by any computable function. These results reveal a substantial gap between theoretical possibility and efficient learnability. They suggest that explaining the empirical success of modern language models requires a refined perspective -- one that takes into account structural properties of natural language that make effective generation possible in practice.