Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction

📅 2026-02-26

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This study addresses the lack of empirical validation regarding the operational efficacy of large language models (LLMs) as “silicon-based agents” in simulating social media user behavior. It introduces the Conditional Comment Prediction (CCP) task to systematically evaluate 8B-scale open-source models—including Llama3.1, Qwen3, and Ministral—across English, German, and Luxembourgish contexts, using real user digital traces. The work compares the effectiveness of explicit versus implicit prompting and supervised fine-tuning (SFT). Findings reveal that while SFT improves surface-level textual alignment, it compromises semantic consistency. Moreover, explicit biographical prompts become redundant after fine-tuning, as models can implicitly infer user traits directly from behavioral histories. These results uncover a decoupling between form and content under low-resource conditions and propose a novel high-fidelity simulation paradigm that prioritizes behavioral traces over descriptive personas.

Technology Category

Application Category

📝 Abstract

The transition of Large Language Models (LLMs) from exploratory tools to active "silicon subjects" in social science lacks extensive validation of operational validity. This study introduces Conditioned Comment Prediction (CCP), a task in which a model predicts how a user would comment on a given stimulus by comparing generated outputs with authentic digital traces. This framework enables a rigorous evaluation of current LLM capabilities with respect to the simulation of social media user behavior. We evaluated open-weight 8B models (Llama3.1, Qwen3, Ministral) in English, German, and Luxembourgish language scenarios. By systematically comparing prompting strategies (explicit vs. implicit) and the impact of Supervised Fine-Tuning (SFT), we identify a critical form vs. content decoupling in low-resource settings: while SFT aligns the surface structure of the text output (length and syntax), it degrades semantic grounding. Furthermore, we demonstrate that explicit conditioning (generated biographies) becomes redundant under fine-tuning, as models successfully perform latent inference directly from behavioral histories. Our findings challenge current "naive prompting" paradigms and offer operational guidelines prioritizing authentic behavioral traces over descriptive personas for high-fidelity simulation.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

social media simulation

operational validity

Conditioned Comment Prediction

behavioral traces

Innovation

Methods, ideas, or system contributions that make the work stand out.

Conditioned Comment Prediction

Operational Validity

Supervised Fine-Tuning