Artificial Impressions: Evaluating Large Language Model Behavior Through the Lens of Trait Impressions

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work investigates whether large language models (LLMs) internally develop implicit “artificial impressions”—analogous to human stereotypes—and how such representations influence downstream behavioral outputs. We formalize the novel concept of *artificial impression* and propose a linear probing method grounded in the two-dimensional Stereotype Content Model (SCM) to stably decode this latent representation from hidden-layer activations. Although artificial impressions are not explicitly verbalized, they reliably predict response quality, hesitation (e.g., hedging or self-correction), and rhetorical strategies. Further analysis reveals that prompt semantics, stylistic cues, and dialectal features significantly modulate the activation strength of these impressions. To our knowledge, this is the first empirical demonstration of structured, implicit bias in LLM decision-making. The findings advance theoretical understanding of LLMs’ social cognition and provide both a conceptual framework and an interpretable tool for controllable alignment and bias mitigation.

Technology Category

Application Category

📝 Abstract

We introduce and study artificial impressions--patterns in LLMs' internal representations of prompts that resemble human impressions and stereotypes based on language. We fit linear probes on generated prompts to predict impressions according to the two-dimensional Stereotype Content Model (SCM). Using these probes, we study the relationship between impressions and downstream model behavior as well as prompt features that may inform such impressions. We find that LLMs inconsistently report impressions when prompted, but also that impressions are more consistently linearly decodable from their hidden representations. Additionally, we show that artificial impressions of prompts are predictive of the quality and use of hedging in model responses. We also investigate how particular content, stylistic, and dialectal features in prompts impact LLM impressions.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM behavior through human-like trait impressions and stereotypes

Studying how hidden representations predict response quality and hedging

Investigating how prompt features impact artificial impression formation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear probes decode impressions from hidden representations

Stereotype Content Model predicts model behavior patterns

Artificial impressions link to response quality and hedging

🔎 Similar Papers

No similar papers found.