🤖 AI Summary
Existing AI psychometrics predominantly repurpose human personality inventories (e.g., Big Five, HEXACO) or ad hoc role definitions, resulting in behavioral distortion and poor domain adaptability. To address this, we propose the first Situation Judgment Test (SJT) framework specifically designed for AI systems, integrating industrial-organizational psychology and personality theory to construct fine-grained, socioemotionally capable virtual personas. Our method innovatively incorporates demographic prior modeling and autobiographical narrative generation, coupled with Pydantic-based structured generation, enabling interpretable and reproducible AI personality modeling and behavioral analysis. We instantiate this framework in a law enforcement assistant scenario, curating a large-scale benchmark: 8,500 virtual personas, 4,000 situational judgment items, and 300,000 AI responses—spanning eight archetype categories and eleven competency dimensions. All data and code are publicly released.
📝 Abstract
AI psychometrics evaluates AI systems in roles that traditionally require emotional judgment and ethical consideration. Prior work often reuses human trait inventories (Big Five, hexaco) or ad hoc personas, limiting behavioral realism and domain relevance. We propose a framework that (1) uses situational judgment tests (SJTs) from realistic scenarios to probe domain-specific competencies; (2) integrates industrial-organizational and personality psychology to design sophisticated personas which include behavioral and psychological descriptors, life history, and social and emotional functions; and (3) employs structured generation with population demographic priors and memoir inspired narratives, encoded with Pydantic schemas. In a law enforcement assistant case study, we construct a rich dataset of personas drawn across 8 persona archetypes and SJTs across 11 attributes, and analyze behaviors across subpopulation and scenario slices. The dataset spans 8,500 personas, 4,000 SJTs, and 300,000 responses. We will release the dataset and all code to the public.