🤖 AI Summary
Existing LLM evaluation frameworks overlook how contextual framing—such as responsibility attribution, temporal scale, and social role—affects perceived rationality in decision-making. Method: We propose a dynamic evaluation paradigm based on programmatically generated vignettes. While preserving fixed game-theoretic structures, we systematically perturb semantic dimensions within variable vignette frames using controllable text generation, formal game modeling, and multi-dimensional perturbation design. Contribution/Results: Large-scale response analysis reveals extreme sensitivity of LLMs to framing: under identical game logic, context shifts reduce decision consistency by up to 68%; 92% of behavioral deviations are accurately predicted by frame-level features. This work advances LLM evaluation from static capability assessment toward context-adaptive paradigms, highlighting the critical role of semantic framing in rationality perception.
📝 Abstract
Large Language Models (LLMs) are increasingly deployed across diverse contexts to support decision-making. While existing evaluations effectively probe latent model capabilities, they often overlook the impact of context framing on perceived rational decision-making. In this study, we introduce a novel evaluation framework that systematically varies evaluation instances across key features and procedurally generates vignettes to create highly varied scenarios. By analyzing decision-making patterns across different contexts with the same underlying game structure, we uncover significant contextual variability in LLM responses. Our findings demonstrate that this variability is largely predictable yet highly sensitive to framing effects. Our results underscore the need for dynamic, context-aware evaluation methodologies for real-world deployments.