Neural network embeddings recover value dimensions from psychometric survey items on par with human data

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This study addresses two longstanding challenges in psychometrics: reliance on labor-intensive human scoring and difficulty modeling negative interdimensional correlations. We propose SQuID (Semantic Questionnaire Item Discovery), a novel method that directly leverages raw semantic embeddings from un-fine-tuned large language models (LLMs) to represent questionnaire items, then recovers and aligns latent value dimensions via embedding-difference analysis. Its key contribution is the first demonstration that purely unsupervised, pre-trained semantic embeddings—without domain-specific fine-tuning—can accurately reconstruct the negative correlational structure captured by human ratings. SQuID achieves high dimensional fidelity, explaining 55% of variance and yielding multidimensional scaling configurations highly consistent with human judgments (factor congruence coefficient > 0.92). The approach is cost-efficient, scalable across domains, and theoretically interpretable, establishing a neural-embedding-based paradigm for efficient, automated psychometric assessment.

Technology Category

Application Category

📝 Abstract

This study introduces "Survey and Questionnaire Item Embeddings Differentials" (SQuID), a novel methodological approach that enables neural network embeddings to effectively recover latent dimensions from psychometric survey items. We demonstrate that embeddings derived from large language models, when processed with SQuID, can recover the structure of human values obtained from human rater judgments on the Revised Portrait Value Questionnaire (PVQ-RR). Our experimental validation compares multiple embedding models across a number of evaluation metrics. Unlike previous approaches, SQuID successfully addresses the challenge of obtaining negative correlations between dimensions without requiring domain-specific fine-tuning. Quantitative analysis reveals that our embedding-based approach explains 55% of variance in dimension-dimension similarities compared to human data. Multidimensional scaling configurations from both types of data show fair factor congruence coefficients and largely follow the underlying theory. These results demonstrate that semantic embeddings can effectively replicate psychometric structures previously established through extensive human surveys. The approach offers substantial advantages in cost, scalability and flexibility while maintaining comparable quality to traditional methods. Our findings have significant implications for psychometrics and social science research, providing a complementary methodology that could expand the scope of human behavior and experience represented in measurement tools.

Problem

Research questions and friction points this paper is trying to address.

Recovering latent dimensions from psychometric survey items using neural embeddings

Addressing negative correlation challenges without domain-specific fine-tuning

Explaining 55% variance in dimension similarities compared to human data

Innovation

Methods, ideas, or system contributions that make the work stand out.

SQuID method enables neural embeddings recovery

LLM embeddings replicate human value structures

No domain fine-tuning for negative correlations

🔎 Similar Papers

Measuring Human and AI Values based on Generative Psychometrics with Large Language Models