Do Role-Playing Agents Practice What They Preach? Belief-Behavior Consistency in LLM-Based Simulations of Human Trust

📅 2025-07-02

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This study investigates the consistency between stated beliefs (“what is said”) and actual behavior (“what is done”) when large language models (LLMs) simulate human trust behaviors in role-playing scenarios. Method: Building upon the trust game paradigm and an enhanced GenAgents agent framework, we propose a quantifiable belief–behavior consistency metric to systematically examine the effects of belief type, timing of information disclosure, and prediction horizon, while evaluating the efficacy of theory-informed prior interventions. Contribution/Results: We find systematic inconsistency in LLMs’ trust-related responses at both individual and group levels; notably, normatively “reasonable” beliefs do not guarantee stable behavioral alignment. This work establishes the first quantitative evaluation framework for assessing trustworthiness in LLM-based role-playing, offering a novel methodology and empirical benchmark for modeling and aligning LLMs’ social behavior.

Technology Category

Application Category

📝 Abstract

As LLMs are increasingly studied as role-playing agents to generate synthetic data for human behavioral research, ensuring that their outputs remain coherent with their assigned roles has become a critical concern. In this paper, we investigate how consistently LLM-based role-playing agents' stated beliefs about the behavior of the people they are asked to role-play ("what they say") correspond to their actual behavior during role-play ("how they act"). Specifically, we establish an evaluation framework to rigorously measure how well beliefs obtained by prompting the model can predict simulation outcomes in advance. Using an augmented version of the GenAgents persona bank and the Trust Game (a standard economic game used to quantify players' trust and reciprocity), we introduce a belief-behavior consistency metric to systematically investigate how it is affected by factors such as: (1) the types of beliefs we elicit from LLMs, like expected outcomes of simulations versus task-relevant attributes of individual characters LLMs are asked to simulate; (2) when and how we present LLMs with relevant information about Trust Game; and (3) how far into the future we ask the model to forecast its actions. We also explore how feasible it is to impose a researcher's own theoretical priors in the event that the originally elicited beliefs are misaligned with research objectives. Our results reveal systematic inconsistencies between LLMs' stated (or imposed) beliefs and the outcomes of their role-playing simulation, at both an individual- and population-level. Specifically, we find that, even when models appear to encode plausible beliefs, they may fail to apply them in a consistent way. These findings highlight the need to identify how and when LLMs' stated beliefs align with their simulated behavior, allowing researchers to use LLM-based agents appropriately in behavioral studies.

Problem

Research questions and friction points this paper is trying to address.

Measure consistency between LLM agents' stated beliefs and role-play behavior

Evaluate how belief types and information timing affect simulation outcomes

Assess feasibility of aligning LLM beliefs with researcher's theoretical priors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluate belief-behavior consistency in LLM agents

Introduce augmented GenAgents persona bank

Develop Trust Game-based consistency metric

🔎 Similar Papers

No similar papers found.