🤖 AI Summary
This study addresses the degradation of GUI performance model validity in crowdsourced experiments caused by participants who disregard instructions or interact haphazardly. To mitigate this issue, the authors propose a pre-task screening mechanism based on a brief, main-task-like GUI interaction—such as image scaling and matching—administered prior to the primary task. Participants’ interaction errors are captured as continuous data quality signals, enabling dynamic thresholding to filter out low-quality contributors. Empirical evaluations on both mouse-based and smartphone platforms demonstrate that this approach substantially reduces the prevalence of anomalous behavior and significantly improves the goodness-of-fit and predictive accuracy of GUI performance models. The method establishes a novel paradigm for ensuring reliable data quality in crowdsourced human-computer interaction research.
📝 Abstract
In crowdsourced user experiments that collect performance data from graphical user interface (GUI) interactions, some participants ignore instructions or act carelessly, threatening the validity of performance models. We investigate a pre-task screening method that requires simple GUI operations analogous to the main task and uses the resulting error as a continuous quality signal. Our pre-task is a brief image-resizing task in which workers match an on-screen card to a physical card; workers whose resizing error exceeds a threshold are excluded from the main experiment. The main task is a standardized pointing experiment with well-established models of movement time and error rate. Across mouse- and smartphone-based crowdsourced experiments, we show that reducing the proportion of workers exhibiting unexpected behavior and tightening the pre-task threshold systematically improve the goodness of fit and predictive accuracy of GUI performance models, demonstrating that brief pre-task screening can enhance data quality.