What Language(s) Does Aya-23 Think In? How Multilinguality Affects Internal Language Representations

📅 2025-07-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the internal language representation mechanisms of the multilingual large language model Aya-23-8B on code-switching, cloze, and translation tasks, benchmarked against monolingually dominant models (e.g., Llama-2). Using logit lens analysis, neuron specialization quantification, and cross-lingual neuron overlap measurement, we find: (1) During translation, the model preferentially activates shared representations across typologically similar languages—challenging the classical “language isolation” hypothesis; (2) Language-specific neurons are significantly enriched in the final decoder layers; (3) Script similarity and typological distance jointly modulate neural activation patterns, while code-switching ratio and base language critically shape representational dynamics. These results demonstrate that multilingual pretraining fundamentally restructures internal model architecture, revealing a distributed, typology-aware representational schema. The findings advance theoretical understanding of multilingual representation learning and establish a new cognitive paradigm for modeling cross-lingual semantics in foundation models.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) excel at multilingual tasks, yet their internal language processing remains poorly understood. We analyze how Aya-23-8B, a decoder-only LLM trained on balanced multilingual data, handles code-mixed, cloze, and translation tasks compared to predominantly monolingual models like Llama 3 and Chinese-LLaMA-2. Using logit lens and neuron specialization analyses, we find: (1) Aya-23 activates typologically related language representations during translation, unlike English-centric models that rely on a single pivot language; (2) code-mixed neuron activation patterns vary with mixing rates and are shaped more by the base language than the mixed-in one; and (3) Aya-23's languagespecific neurons for code-mixed inputs concentrate in final layers, diverging from prior findings on decoder-only models. Neuron overlap analysis further shows that script similarity and typological relations impact processing across model types. These findings reveal how multilingual training shapes LLM internals and inform future cross-lingual transfer research.
Problem

Research questions and friction points this paper is trying to address.

Analyze Aya-23's multilingual processing mechanisms
Compare multilingual and monolingual LLMs' internal representations
Investigate neuron specialization in code-mixed language tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes Aya-23-8B multilingual processing via logit lens
Identifies typologically related language activation patterns
Reveals language-specific neuron concentration in final layers
🔎 Similar Papers
No similar papers found.