🤖 AI Summary
This study addresses the phenomenon of “compartmentalization” in large language models, wherein distinct internal representations emerge for the same underlying concept when expressed across different modalities—such as multiple languages, formal versus natural language, or varying paradigms—leading to fragmented representations, inefficient model capacity utilization, and reduced sample efficiency. The work presents the first systematic investigation of this issue, employing synthetic parallel data, multilingual training protocols, and representational analysis to evaluate conceptual unification capabilities across model scales and training strategies. Findings reveal that smaller models exhibit near-complete compartmentalization early in multilingual learning, a limitation that persists even with easily learnable parallel data. Moreover, all tested interventions demonstrate a phase-transition-like dependence on the number of expression forms involved.
📝 Abstract
In the training data used by large language models (LLMs), the same latent concept is often presented in multiple distinct ways: the same facts appear in English and Swahili; many functions can be expressed in both Python and Haskell; we can express propositions in both formal and natural language. We show that LLMs can exhibit compartmentalization, where they fail to identify and share statistical strength between distinct presentations of unified concepts. In the worst case, LLMs simply learn parallel internal representations of each presentation of the concept, saturating model capacity with redundancies and decreasing sample efficiency with the number of such presentations. We also demonstrate that synthetic parallel data can fail to improve this despite being easily learned itself. Under this framework, we find that, for small models, early multilingual learning is nearly entirely compartmentalized. Finally, all interventions that we study exhibit a phase transition in which their effectiveness depends on the number of distinct presentations, suggesting that the language modeling objective may only inconsistently unify representations.