🤖 AI Summary
This study proposes an unsupervised framework to automatically identify inattentive respondents who provide random or low-effort answers in behavioral and social science surveys. The approach jointly leverages geometric reconstruction via autoencoders and probabilistic dependency modeling through Chow–Liu trees to assess response consistency, enhanced by a novel “percentile loss” to improve robustness against outliers. The work reveals a “psychometric–machine learning alignment” phenomenon: questionnaire structures exhibiting high internal consistency inherently facilitate effective algorithmic detection of data quality issues. Experiments across nine real-world, heterogeneous survey datasets demonstrate that detection performance is primarily governed by questionnaire structure rather than model complexity, with linear models already achieving strong discriminative power when applied to high-quality scales.
📝 Abstract
The integrity of behavioral and social-science surveys depends on detecting inattentive respondents who provide random or low-effort answers. Traditional safeguards, such as attention checks, are often costly, reactive, and inconsistent. We propose a unified, label-free framework for inattentiveness detection that scores response coherence using complementary unsupervised views: geometric reconstruction (Autoencoders) and probabilistic dependency modeling (Chow-Liu trees). While we introduce a "Percentile Loss" objective to improve Autoencoder robustness against anomalies, our primary contribution is identifying the structural conditions that enable unsupervised quality control. Across nine heterogeneous real-world datasets, we find that detection effectiveness is driven less by model complexity than by survey structure: instruments with coherent, overlapping item batteries exhibit strong covariance patterns that allow even linear models to reliably separate attentive from inattentive respondents. This reveals a critical ``Psychometric-ML Alignment'': the same design principles that maximize measurement reliability (e.g., internal consistency) also maximize algorithmic detectability. The framework provides survey platforms with a scalable, domain-agnostic diagnostic tool that links data quality directly to instrument design, enabling auditing without additional respondent burden.