Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work reveals that large language models (LLMs) implicitly infer users’ demographic attributes from stereotypical cues in dialogue, degrading response quality for minority groups; critically, such inferences are persistent—even when users explicitly state their identity, models continue misclassifying them. To systematically investigate this phenomenon, the authors introduce the first empirical framework comprising controlled synthetic dialogue construction, intermediate-layer activation analysis, and semantic evaluation—demonstrating the pervasive existence of stereotype-driven implicit personalization across multiple demographic groups. Methodologically, they propose a trainable linear probe that intervenes on internal model representations to mitigate erroneous demographic inference. Evaluated on several mainstream LLMs, the intervention reduces average false identity inference rates by 47.3% and significantly improves both fairness and response quality.

Technology Category

Application Category

📝 Abstract

Generative Large Language Models (LLMs) infer user's demographic information from subtle cues in the conversation -- a phenomenon called implicit personalization. Prior work has shown that such inferences can lead to lower quality responses for users assumed to be from minority groups, even when no demographic information is explicitly provided. In this work, we systematically explore how LLMs respond to stereotypical cues using controlled synthetic conversations, by analyzing the models' latent user representations through both model internals and generated answers to targeted user questions. Our findings reveal that LLMs do infer demographic attributes based on these stereotypical signals, which for a number of groups even persists when the user explicitly identifies with a different demographic group. Finally, we show that this form of stereotype-driven implicit personalization can be effectively mitigated by intervening on the model's internal representations using a trained linear probe to steer them toward the explicitly stated identity. Our results highlight the need for greater transparency and control in how LLMs represent user identity.

Problem

Research questions and friction points this paper is trying to address.

LLMs infer demographics from implicit conversational cues

Stereotypical cues lead to biased responses for minorities

Mitigating bias via intervention on model representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzing latent user representations via synthetic conversations

Mitigating bias with trained linear probe interventions

Steering model responses toward explicitly stated identities

🔎 Similar Papers

Stereotype or Personalization? User Identity Biases Chatbot Recommendations