Artificial Impressions: Evaluating Large Language Model Behavior Through the Lens of Trait Impressions

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates whether large language models (LLMs) internally develop implicit “artificial impressions”—analogous to human stereotypes—and how such representations influence downstream behavioral outputs. We formalize the novel concept of *artificial impression* and propose a linear probing method grounded in the two-dimensional Stereotype Content Model (SCM) to stably decode this latent representation from hidden-layer activations. Although artificial impressions are not explicitly verbalized, they reliably predict response quality, hesitation (e.g., hedging or self-correction), and rhetorical strategies. Further analysis reveals that prompt semantics, stylistic cues, and dialectal features significantly modulate the activation strength of these impressions. To our knowledge, this is the first empirical demonstration of structured, implicit bias in LLM decision-making. The findings advance theoretical understanding of LLMs’ social cognition and provide both a conceptual framework and an interpretable tool for controllable alignment and bias mitigation.

Technology Category

Application Category

📝 Abstract
We introduce and study artificial impressions--patterns in LLMs' internal representations of prompts that resemble human impressions and stereotypes based on language. We fit linear probes on generated prompts to predict impressions according to the two-dimensional Stereotype Content Model (SCM). Using these probes, we study the relationship between impressions and downstream model behavior as well as prompt features that may inform such impressions. We find that LLMs inconsistently report impressions when prompted, but also that impressions are more consistently linearly decodable from their hidden representations. Additionally, we show that artificial impressions of prompts are predictive of the quality and use of hedging in model responses. We also investigate how particular content, stylistic, and dialectal features in prompts impact LLM impressions.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM behavior through human-like trait impressions and stereotypes
Studying how hidden representations predict response quality and hedging
Investigating how prompt features impact artificial impression formation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear probes decode impressions from hidden representations
Stereotype Content Model predicts model behavior patterns
Artificial impressions link to response quality and hedging
🔎 Similar Papers
No similar papers found.