Mapping how LLMs debate societal issues when shadowing human personality traits, sociodemographics and social media behavior

📅 2026-04-30

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This study addresses the lack of systematic datasets for investigating how large language models (LLMs) express divergent perspectives on controversial social issues across varied social identities and contexts. The authors construct CDS, a synthetic corpus comprising 190,000 records, which uniquely integrates personality traits, sociodemographic attributes, and role-specific prompts to elicit responses from 19 LLMs on four major societal controversies. Each response is annotated along 17 socio-psychological dimensions. Combining controlled prompt engineering with interpretable NLP techniques—such as textual formal thought network analysis—and an interactive visualization platform, this work enables fine-grained, cross-model, cross-identity, and cross-topic analyses of emotional tone and semantic framing. The resulting framework offers a novel toolkit for auditing the social sensitivity and potential biases embedded in LLM-generated content.

📝 Abstract

Large Language Models (LLMs) can strongly shape social discourse, yet datasets investigating how LLM outputs vary across controlled social and contextual prompting remain sparse. Cognitive Digital Shadows (CDS) is a 190,000-record synthetic corpus supporting analyses of LLM-generated discourse. Each CDS record is generated by one of 19 LLMs, prompted to shadow either a human persona or an AI-assistant role. CDS contains LLM responses on 4 controversial societal topics: vaccines/healthcare, social media disinformation, the gender gap in science, and STEM stereotypes. Persona-conditioned records encode 17 sociodemographic and psychological attributes, providing data linking LLMs' prompts, language, stances and reasoning. Texts are validated for topic anchoring and can support emotional analyses via interpretable NLP (e.g. textual forma mentis networks). CDS is enriched by a pooling platform with user-friendly dashboards, enabling easy, interactive group-level comparisons of emotional and semantic framing across personas, topics and models. The CDS prompting framework supports future audits of LLMs' bias, social sensitivity and alignment.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

social discourse

sociodemographics

personality traits

controversial societal topics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cognitive Digital Shadows

persona-conditioned prompting

synthetic discourse corpus