🤖 AI Summary
This work addresses the limitations imposed by the inherent constraints of natural language expressions on the reasoning capabilities of large language models in complex tasks. It proposes that designing structured linguistic representations can effectively construct and activate internal cognitive schemata within these models, thereby enhancing their intelligent performance. For the first time, the study formalizes linguistic representation design as a critical pathway for extending the capabilities of large language models, integrating insights from linguistics and symbolic representation theory. Through controlled experiments, it systematically analyzes how different representational forms influence internal model activations and outputs. The findings demonstrate that optimizing linguistic representations—without altering model parameters or scale—can significantly improve task performance, thereby validating the efficacy and potential of this approach.
📝 Abstract
Although natural language is the default medium for Large Language Models (LLMs), its limited expressive capacity creates a profound bottleneck for complex problem-solving. While recent advancements in AI have relied heavily on scaling, merely internalizing knowledge does not guarantee its effective application. Defining language representation as the linguistic and symbolic constructs used to map and model the real world, this paper argues that shaping schemas through advanced language representation is the next frontier for expanding LLM intelligence. We posit that an LLM's knowledge activation and organization -- its schema -- depends heavily on the structural and symbolic sophistication of the language used to represent a given task. This paper contributes both a formalization of this claim and the empirical evidence to support it. With a new formalization, we present multiple lines of evidence to support our position: Firstly, we review recent empirical practices and emerging methodologies that demonstrate the substantial performance gains achievable through deliberate language representation design, even without modifying model parameters or scale. Secondly, we conduct controlled experiments showing that LLM performance and its internal feature activations vary under different language representations of the same underlying task. Together, these findings highlight language representation design as a promising direction for future research.