Language Models Model Language

📅 2025-10-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Mainstream linguistics—grounded in Saussurean/Chomskyan frameworks—criticizes LLMs for modeling only “deep structure” or “semantic primitives,” thereby dismissing their linguistic competence as inherently deficient. Method: Drawing on Witold Mańczak’s empiricist conception of language as “the aggregate frequency distribution of actually uttered and written forms,” this paper proposes a usage-based modeling paradigm. It integrates historical linguistics and statistical language analysis, leveraging large-scale corpus frequency distributions to formalize language as emergent from observable usage patterns. Contribution: This approach transcends formalist assumptions, reframing LLMs not as lacking deep structure but as capturing language through empirically grounded frequency regularities. It establishes an objective, measurable standard for linguistic modeling—centered on attested usage—thereby providing a theoretically robust and empirically anchored framework for LLM design, evaluation, and interpretation.

Technology Category

Application Category

📝 Abstract
Linguistic commentary on LLMs, heavily influenced by the theoretical frameworks of de Saussure and Chomsky, is often speculative and unproductive. Critics challenge whether LLMs can legitimately model language, citing the need for"deep structure"or"grounding"to achieve an idealized linguistic"competence."We argue for a radical shift in perspective towards the empiricist principles of Witold Ma'nczak, a prominent general and historical linguist. He defines language not as a"system of signs"or a"computational system of the brain"but as the totality of all that is said and written. Above all, he identifies frequency of use of particular language elements as language's primary governing principle. Using his framework, we challenge prior critiques of LLMs and provide a constructive guide for designing, evaluating, and interpreting language models.
Problem

Research questions and friction points this paper is trying to address.

Challenges speculative linguistic critiques of LLMs
Proposes empiricist framework based on language frequency
Provides constructive guide for designing language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adopts Mańczak's empiricist frequency-based language framework
Redefines language modeling around usage and frequency principles
Provides constructive design and evaluation guidelines for LLMs
🔎 Similar Papers
No similar papers found.