RoleConflictBench: A Benchmark of Role Conflict Scenarios for Evaluating LLMs' Contextual Sensitivity

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the situational sensitivity of large language models (LLMs) in social dilemmas involving role conflict—i.e., their ability to identify and balance multiple, ambiguous, and urgent social role expectations. To this end, we introduce the first benchmark dataset comprising over 13,000 real-world scenarios, spanning 65 social roles and their associated duties and contextual urgency levels. We propose a three-stage controllable generation pipeline and the first dedicated evaluation framework for role conflict. A systematic assessment of ten mainstream LLMs reveals that while models exhibit basic situational responsiveness, their decisions are strongly biased by entrenched societal stereotypes—particularly those tied to family, occupation, and religion (with pronounced effects for male and Abrahamic religious roles). Overall, situational sensitivity remains inadequate. This study is the first to quantitatively expose how social role biases fundamentally impair LLMs’ contextual reasoning, establishing a novel paradigm for evaluating social cognition in trustworthy AI.

Technology Category

Application Category

📝 Abstract
Humans often encounter role conflicts -- social dilemmas where the expectations of multiple roles clash and cannot be simultaneously fulfilled. As large language models (LLMs) become increasingly influential in human decision-making, understanding how they behave in complex social situations is essential. While previous research has evaluated LLMs' social abilities in contexts with predefined correct answers, role conflicts represent inherently ambiguous social dilemmas that require contextual sensitivity: the ability to recognize and appropriately weigh situational cues that can fundamentally alter decision priorities. To address this gap, we introduce RoleConflictBench, a novel benchmark designed to evaluate LLMs' contextual sensitivity in complex social dilemmas. Our benchmark employs a three-stage pipeline to generate over 13K realistic role conflict scenarios across 65 roles, systematically varying their associated expectations (i.e., their responsibilities and obligations) and situational urgency levels. By analyzing model choices across 10 different LLMs, we find that while LLMs show some capacity to respond to these contextual cues, this sensitivity is insufficient. Instead, their decisions are predominantly governed by a powerful, inherent bias related to social roles rather than situational information. Our analysis quantifies these biases, revealing a dominant preference for roles within the Family and Occupation domains, as well as a clear prioritization of male roles and Abrahamic religions across most evaluatee models.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' contextual sensitivity in role conflict scenarios
Assessing how LLMs handle ambiguous social dilemmas with conflicting expectations
Quantifying inherent biases in LLM decisions across social roles and domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed benchmark for evaluating LLM contextual sensitivity
Generated diverse role conflict scenarios using systematic pipeline
Quantified inherent social biases across multiple language models
🔎 Similar Papers
No similar papers found.