I Have No Mouth, and I Must Rhyme: Uncovering Internal Phonetic Representations in LLaMA 3.2

📅 2025-08-04
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
This study investigates how LLaMA-3.2-1B-Instruct—trained without speech input or explicit phonetic supervision—spontaneously acquires and leverages lexical-level phonological representations to perform rhyme-related tasks. Method: We employ latent-space geometric analysis, attention-head functional probing, and cross-layer visualization to examine the internal representation structure. Contribution/Results: We discover that token embeddings spontaneously organize in the latent space into a structured, human-like IPA vowel chart, revealing emergent, geometry-consistent vowel representations. Furthermore, we identify a class of “phoneme-shifting heads” whose attention patterns dynamically encode phonemic similarity and transformation relationships. These findings demonstrate that large language models can autonomously construct hierarchical, geometrically coherent phoneme representations—even in the complete absence of auditory signals—thereby uncovering an intrinsic, emergent mechanism for phonological cognition in language models.

Technology Category

Application Category

📝 Abstract
Large language models demonstrate proficiency on phonetic tasks, such as rhyming, without explicit phonetic or auditory grounding. In this work, we investigate how verb|Llama-3.2-1B-Instruct| represents token-level phonetic information. Our results suggest that Llama uses a rich internal model of phonemes to complete phonetic tasks. We provide evidence for high-level organization of phoneme representations in its latent space. In doing so, we also identify a ``phoneme mover head" which promotes phonetic information during rhyming tasks. We visualize the output space of this head and find that, while notable differences exist, Llama learns a model of vowels similar to the standard IPA vowel chart for humans, despite receiving no direct supervision to do so.
Problem

Research questions and friction points this paper is trying to address.

How LLaMA 3.2 models phonetic info without auditory input
Identifying internal phoneme representations in LLaMA 3.2
Discovering phoneme organization in LLaMA's latent space
Innovation

Methods, ideas, or system contributions that make the work stand out.

Investigates Llama-3.2-1B's phonetic representations
Identifies phoneme mover head for rhyming
Visualizes IPA-like vowel model without supervision
🔎 Similar Papers
No similar papers found.
Jack Merullo
Jack Merullo
Brown University
interpretabilitylanguage modelsnatural language processingmultimodal learning
A
Arjun Khurana
Brown University, Providence, Rhode Island
O
Oliver McLaughlin
Brown University, Providence, Rhode Island