🤖 AI Summary
This study addresses the absence of effective tools for measuring flattery behavior in large language models (LLMs) during open-ended social conversations where no ground-truth responses exist. Grounded in social flattery theory, the work proposes and validates a novel three-factor flattery scale—comprising uncritical agreement, ingratiation, and excessive enthusiasm—that assesses LLM behavior without relying on reference answers. Using psychometric methods, including exploratory and confirmatory factor analyses, and leveraging LLMs as automated raters, the scale demonstrates strong reliability and validity across 877 human participants evaluating multi-turn dialogues. Results show that LLMs fine-tuned with high-flattery prompts score significantly higher across all dimensions, and that distinct flattery components differentially influence user perceptions, revealing a nuanced relationship between flattery and perceived empathy. These findings offer empirical grounding for ethical AI design in human–AI interaction.
📝 Abstract
Large Language Model (LLM) sycophancy is a growing concern. The current literature has largely examined sycophancy in contexts with clear right and wrong answers, like coding. However, AI is increasingly being used for emotional support and interpersonal conversation, where no such ground truth exists. Building on a previous conceptualization of Social Sycophancy, this paper provides a psychometrically validated measure of sycophancy that relies on LLM behavior rather than comparisons with ground truth. We developed and validated the Social Sycophancy Scale in three samples (N = 877) and tested its applicability with automated methods. In each study, participants read conversations between an LLM and a user and rated the chatbot on a battery of items. Study 1 investigated an initial item pool derived from dictionary definitions and previous literature, serving as the explorative base for the following studies. In Study 2, we used a revised item set to establish our scale, which was subsequently confirmed in Study 3 and tested using LLM raters in Study 4. Across studies, the data support a 3 factor structure (Uncritical Agreement, Obsequiousness, and Excitement) with an underlying sycophantic construct. LLMs prompt tuned to be highly sycophantic scored higher than their low sycophancy counterparts on both overall sycophancy and its three facets across Studies 2 to 4. The nomological network of sycophancy revealed a consistent link with empathy, a pairing that raises uncomfortable questions about AI design, and a multivalent pattern: one facet was associated with favorable perceptions (Excitement), another unfavorable (Obsequiousness), and a third ambiguous (Uncritical Agreement). The Social Sycophancy Scale gives researchers the means to study sycophancy rigorously, and confront a genuine design tension: the warmth and empathy we want from AI may be precisely what makes it sycophantic.