A Novel Framework for Augmenting Rating Scale Tests with LLM-Scored Text Data

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional psychological assessments rely on structured scales, limiting their ability to capture nuanced mental representations expressed in natural language. Method: We propose an LLM-augmented assessment framework that employs large language models for unsupervised intelligent scoring of open-ended textual responses. Leveraging item response theory (IRT), we quantify the information content of LLM-generated items and select high-information-gain items for dynamic integration with conventional scale items to construct enhanced assessments. Contribution/Results: Validated on depression assessment using both real and synthetic data, our method achieves significant gains in measurement precision without human annotation or expert-defined rules. Information-theoretic analysis shows gains equivalent to adding 6.3 (real data) or 16.0 (synthetic data) items to a 19-item baseline scale—demonstrating efficacy, scalability, and transformative potential for psychometric paradigms.

Technology Category

Application Category

📝 Abstract
Psychological assessments typically rely on structured rating scales, which cannot incorporate the rich nuance of a respondent's natural language. This study leverages recent LLM advances to harness qualitative data within a novel conceptual framework, combining LLM-scored text and traditional rating-scale items to create an augmented test. We demonstrate this approach using depression as a case study, developing and assessing the framework on a real-world sample of upper secondary students (n=693) and corresponding synthetic dataset (n=3,000). On held-out test sets, augmented tests achieved statistically significant improvements in measurement precision and accuracy. The information gain from the LLM items was equivalent to adding between 6.3 (real data) and 16.0 (synthetic data) items to the original 19-item test. Our approach marks a conceptual shift in automated scoring that bypasses its typical bottlenecks: instead of relying on pre-labelled data or complex expert-created rubrics, we empirically select the most informative LLM scoring instructions based on calculations of item information. This framework provides a scalable approach for leveraging the growing stream of transcribed text to enhance traditional psychometric measures, and we discuss its potential utility in clinical health and beyond.
Problem

Research questions and friction points this paper is trying to address.

Augmenting rating scale tests with LLM-scored text data to enhance psychological assessments
Improving measurement precision and accuracy by combining qualitative and quantitative data
Developing scalable framework to incorporate natural language into psychometric measures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining LLM-scored text with traditional rating scales
Empirically selecting LLM scoring instructions via item information
Bypassing pre-labeled data requirements through automated scoring
🔎 Similar Papers
No similar papers found.