Nationality encoding in language model hidden states: Probing culturally differentiated representations in persona-conditioned academic text

📅 2026-04-11

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

This study investigates whether large language models encode distinguishable national-cultural representations in their hidden states when generating academic text. By constructing introduction corpora conditioned on English and Chinese academic personas—annotated for structural, lexical, and stance features—the authors employ multilayer logistic regression probing to analyze nationality signals across the layers of the Gemma-3-4b-it model. Rigorous validation includes shuffled-label baselines, surface-text classifiers, and cross-model-family tests. The work pioneers the extension of probing methodologies to sociolinguistic attributes such as nationality, revealing that nationality information peaks at layer 18 with a cross-validated accuracy of 0.968, follows a non-monotonic encoding trajectory, and correlates with systematic linguistic differences at high-signal positions, despite the absence of significant nationality distinctions in surface text.

Technology Category

Application Category

📝 Abstract

Large language models are increasingly used as writing tools and pedagogical resources in English for Academic Purposes, but it remains unclear whether they encode culturally differentiated representations when generating academic text. This study tests whether Gemma-3-4b-it encodes nationality-discriminative information in hidden states when generating research article introductions conditioned by British and Chinese academic personas. A corpus of 270 texts was generated from 45 prompt templates crossed with six persona conditions in a 2 x 3 design. Logistic regression probes were trained on hidden-state activations across all 35 layers, with shuffled-label baselines, a surface-text skyline classifier, cross-family tests, and sentence-level baselines used as controls. Probe-selected token positions were annotated for structural, lexical, and stance features using the Stanza NLP pipeline. The nationality probe reached 0.968 cross-validated accuracy at Layer 18, with perfect held-out classification. Nationality encoding followed a non-monotonic trajectory across layers, with structural effects strongest in the middle to upper network and lexical-domain effects peaking earlier. At high-signal token positions, British-associated patterns showed more postmodification, hedging, boosting, passive voice, and evaluative or process-oriented vocabulary, while Chinese-associated patterns showed more premodification, nominal predicates, and sociocultural or internationalisation vocabulary. However, sentence-level analysis found no significant nationality differences in the full generated surface text. The findings extend probing methodology to a sociolinguistic attribute and have practical implications for EAP and language pedagogy.

Problem

Research questions and friction points this paper is trying to address.

nationality encoding

culturally differentiated representations

persona-conditioned text

academic writing

language model probing

Innovation

Methods, ideas, or system contributions that make the work stand out.

probing

nationality encoding

hidden states