Can LLMs Simulate Personas with Reversed Performance? A Benchmark for Counterfactual Instruction Following

📅 2025-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current large language models (LLMs) struggle to simulate reverse-ability personas—such as low-competence students—limiting the diversity and practicality of virtual learning environments. Method: Focusing on mathematical reasoning, we introduce CounterFactBench, the first counterfactual instruction-following benchmark designed to systematically evaluate LLMs’ ability to invert persona traits. We formalize “counterfactual instruction following” as a novel task and propose a multidimensional evaluation framework that jointly models capability levels (e.g., reasoning proficiency) and demographic attributes (e.g., race). Contribution/Results: Experiments across major open- and closed-source models—including OpenAI’s o1—reveal consistent and significant failure in persona inversion, with performance deteriorating further under capability–attribute intersections. These findings expose a fundamental limitation in controllable persona modeling, underscoring the need for new architectural and training paradigms to support faithful, equitable, and pedagogically grounded role simulation.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are now increasingly widely used to simulate personas in virtual environments, leveraging their instruction-following capability. However, we discovered that even state-of-the-art LLMs cannot simulate personas with reversed performance (e.g., student personas with low proficiency in educational settings), which impairs the simulation diversity and limits the practical applications of the simulated environments. In this work, using mathematical reasoning as a representative scenario, we propose the first benchmark dataset for evaluating LLMs on simulating personas with reversed performance, a capability that we dub"counterfactual instruction following". We evaluate both open-weight and closed-source LLMs on this task and find that LLMs, including the OpenAI o1 reasoning model, all struggle to follow counterfactual instructions for simulating reversedly performing personas. Intersectionally simulating both the performance level and the race population of a persona worsens the effect even further. These results highlight the challenges of counterfactual instruction following and the need for further research.
Problem

Research questions and friction points this paper is trying to address.

LLMs struggle to simulate personas with reversed performance
Lack of benchmark for counterfactual instruction following evaluation
Performance declines when simulating intersectional reversed personas
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark dataset for reversed performance personas
Evaluates counterfactual instruction following capability
Intersectional simulation worsens performance effects
🔎 Similar Papers
No similar papers found.
S
Sai Adith Senthil Kumar
Department of Computer Science, George Mason University
H
Hao Yan
Department of Computer Science, George Mason University
S
Saipavan Perepa
Department of Computer Science, George Mason University
Murong Yue
Murong Yue
George Mason University
Large Langugae Model
Z
Ziyu Yao
Department of Computer Science, George Mason University