🤖 AI Summary
This study investigates how writing style influences the discreteness of embedding spaces in multilingual large language models. Using a bilingual literary corpus with controlled alternation of themes and styles, we systematically compare the stylistic sensitivity of French–English Transformer models—including multiple generations of state-of-the-art architectures. We find, for the first time, that writing style—not topic—dominates the discrete structure of embeddings, challenging the prevailing topic-centric interpretability paradigm. Moreover, models exhibit systematic differences in style-induced embedding discreteness. Our work establishes a novel, quantifiable framework for stylistic analysis of embeddings, grounded in empirical measurement of style-specific divergence. This provides both methodological foundations and empirical evidence for advancing interpretability research and enabling style-controllable text generation in large language models. (138 words)
📝 Abstract
This paper analyzes how writing style affects the dispersion of embedding vectors across multiple, state-of-the-art language models. While early transformer models primarily aligned with topic modeling, this study examines the role of writing style in shaping embedding spaces. Using a literary corpus that alternates between topics and styles, we compare the sensitivity of language models across French and English. By analyzing the particular impact of style on embedding dispersion, we aim to better understand how language models process stylistic information, contributing to their overall interpretability.