đ¤ AI Summary
This study investigates how LLaMA-3.2-1B-Instructâtrained without speech input or explicit phonetic supervisionâspontaneously acquires and leverages lexical-level phonological representations to perform rhyme-related tasks.
Method: We employ latent-space geometric analysis, attention-head functional probing, and cross-layer visualization to examine the internal representation structure.
Contribution/Results: We discover that token embeddings spontaneously organize in the latent space into a structured, human-like IPA vowel chart, revealing emergent, geometry-consistent vowel representations. Furthermore, we identify a class of âphoneme-shifting headsâ whose attention patterns dynamically encode phonemic similarity and transformation relationships. These findings demonstrate that large language models can autonomously construct hierarchical, geometrically coherent phoneme representationsâeven in the complete absence of auditory signalsâthereby uncovering an intrinsic, emergent mechanism for phonological cognition in language models.
đ Abstract
Large language models demonstrate proficiency on phonetic tasks, such as rhyming, without explicit phonetic or auditory grounding. In this work, we investigate how verb|Llama-3.2-1B-Instruct| represents token-level phonetic information. Our results suggest that Llama uses a rich internal model of phonemes to complete phonetic tasks. We provide evidence for high-level organization of phoneme representations in its latent space. In doing so, we also identify a ``phoneme mover head" which promotes phonetic information during rhyming tasks. We visualize the output space of this head and find that, while notable differences exist, Llama learns a model of vowels similar to the standard IPA vowel chart for humans, despite receiving no direct supervision to do so.