🤖 AI Summary
Traditional Chinese Personality Situational Judgment Tests (PSJTs) suffer from time-intensive development, susceptibility to bias, and poor scalability. Method: This study pioneers the systematic validation of GPT-4 for autonomously generating high-quality PSJTs, employing a validity-optimized structured prompt template and temperature-parameter tuning to ensure both content validity and psychometric rigor. Contribution/Results: Empirical evaluation demonstrates that the AI-generated PSJT achieves strong internal consistency (Cronbach’s α > 0.8) and criterion-related validity across all Five-Factor Model dimensions, with clear factorial structure. Notably, its overall psychometric performance surpasses that of expert-crafted versions. The approach drastically accelerates test development while ensuring reproducibility and scalability—offering a novel, resource-efficient paradigm for rapid construction of personality assessment tools in low-resource settings.
📝 Abstract
Personality assessment, particularly through situational judgment tests (SJTs), is a vital tool for psychological research, talent selection, and educational evaluation. This study explores the potential of GPT-4, a state-of-the-art large language model (LLM), to automate the generation of personality situational judgment tests (PSJTs) in Chinese. Traditional SJT development is labor-intensive and prone to biases, while GPT-4 offers a scalable, efficient alternative. Two studies were conducted: Study 1 evaluated the impact of prompt design and temperature settings on content validity, finding that optimized prompts with a temperature of 1.0 produced creative and accurate items. Study 2 assessed the psychometric properties of GPT-4-generated PSJTs, revealing that they demonstrated satisfactory reliability and validity, surpassing the performance of manually developed tests in measuring the Big Five personality traits. This research highlights GPT-4's effectiveness in developing high-quality PSJTs, providing a scalable and innovative method for psychometric test development. These findings expand the possibilities of automatic item generation and the application of LLMs in psychology, and offer practical implications for streamlining test development processes in resource-limited settings.