Linguists should learn to love speech-based deep learning models

📅 2025-12-16

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Current purely text-based large language models (LLMs) suffer from fundamental limitations in studying key linguistic phenomena—such as speech perception, prosody, dialectal variation, and child language acquisition—due to their inability to model the speech modality. This paper systematically establishes, for the first time, the indispensable role of audio-driven deep models in linguistic theory construction and proposes a novel “technical model–linguistic interpretation” bidirectional co-development paradigm. Methodologically, we integrate speech representation learning, self-supervised audio models (e.g., wav2vec 2.0, Whisper), neurobehavioral modeling, and cross-modal interpretability analysis. Our core contribution is the formalization of speech deep models as foundational infrastructure for language cognition modeling, thereby enabling deep methodological and theoretical integration between computational and experimental linguistics, and providing essential support for embodied, multimodal theories of language cognition. (149 words)

Technology Category

Application Category

📝 Abstract

Futrell and Mahowald present a useful framework bridging technology-oriented deep learning systems and explanation-oriented linguistic theories. Unfortunately, the target article's focus on generative text-based LLMs fundamentally limits fruitful interactions with linguistics, as many interesting questions on human language fall outside what is captured by written text. We argue that audio-based deep learning models can and should play a crucial role.

Problem

Research questions and friction points this paper is trying to address.

Bridging deep learning and linguistic theories

Addressing limitations of text-based language models

Advocating for audio-based models in linguistics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Audio-based deep learning models for language analysis

Bridging deep learning systems with linguistic theories

Focus on speech data beyond text-based LLMs

🔎 Similar Papers

dMel: Speech Tokenization made Simple