Automatic Item Generation for Personality Situational Judgment Tests with Large Language Models

📅 2024-12-10

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Traditional Chinese Personality Situational Judgment Tests (PSJTs) suffer from time-intensive development, susceptibility to bias, and poor scalability. Method: This study pioneers the systematic validation of GPT-4 for autonomously generating high-quality PSJTs, employing a validity-optimized structured prompt template and temperature-parameter tuning to ensure both content validity and psychometric rigor. Contribution/Results: Empirical evaluation demonstrates that the AI-generated PSJT achieves strong internal consistency (Cronbach’s α > 0.8) and criterion-related validity across all Five-Factor Model dimensions, with clear factorial structure. Notably, its overall psychometric performance surpasses that of expert-crafted versions. The approach drastically accelerates test development while ensuring reproducibility and scalability—offering a novel, resource-efficient paradigm for rapid construction of personality assessment tools in low-resource settings.

Technology Category

Application Category

📝 Abstract

Personality assessment, particularly through situational judgment tests (SJTs), is a vital tool for psychological research, talent selection, and educational evaluation. This study explores the potential of GPT-4, a state-of-the-art large language model (LLM), to automate the generation of personality situational judgment tests (PSJTs) in Chinese. Traditional SJT development is labor-intensive and prone to biases, while GPT-4 offers a scalable, efficient alternative. Two studies were conducted: Study 1 evaluated the impact of prompt design and temperature settings on content validity, finding that optimized prompts with a temperature of 1.0 produced creative and accurate items. Study 2 assessed the psychometric properties of GPT-4-generated PSJTs, revealing that they demonstrated satisfactory reliability and validity, surpassing the performance of manually developed tests in measuring the Big Five personality traits. This research highlights GPT-4's effectiveness in developing high-quality PSJTs, providing a scalable and innovative method for psychometric test development. These findings expand the possibilities of automatic item generation and the application of LLMs in psychology, and offer practical implications for streamlining test development processes in resource-limited settings.

Problem

Research questions and friction points this paper is trying to address.

Automating personality test generation using GPT-4

Improving efficiency and reducing biases in SJT development

Validating reliability of AI-generated psychometric tests

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses GPT-4 for automatic PSJT generation

Optimizes prompt design and temperature settings

Ensures reliability and validity of generated tests

🔎 Similar Papers

No similar papers found.