🤖 AI Summary
In educational assessment, time pressure often induces individualized shifts in test-takers’ response patterns, compromising measurement validity. To address this, we propose a latent-variable Item Response Theory (IRT) model that embeds individual-specific change points—marking abrupt transitions in response behavior—within the classical IRT framework for the first time. Our Bayesian method jointly estimates item parameters, person ability, and the location of behavioral change points by integrating change-point detection with an extended IRT formulation. Through simulation studies and empirical analyses—including high-stakes examination data—we demonstrate that the model accurately recovers both psychometric parameters and change-point locations; reveals systematic differences in change-point distributions across testing conditions; and substantially reduces ability estimation bias, particularly enhancing scoring accuracy under time pressure. This work establishes an interpretable, generalizable modeling paradigm for dynamic cognitive assessment.
📝 Abstract
Educational assessments are valuable tools for measuring student knowledge and skills, but their validity can be compromised when test takers exhibit changes in response behavior due to factors such as time pressure. To address this issue, we introduce a novel latent factor model with change-points for item response data, designed to detect and account for individual-level shifts in response patterns during testing. This model extends traditional Item Response Theory (IRT) by incorporating person-specific change-points, which enables simultaneous estimation of item parameters, person latent traits, and the location of behavioral changes. We evaluate the proposed model through extensive simulation studies, which demonstrate its ability to accurately recover item parameters, change-point locations, and individual ability estimates under various conditions. Our findings show that accounting for change-points significantly reduces bias in ability estimates, particularly for respondents affected by time pressure. Application of the model to two real-world educational testing datasets reveals distinct patterns of change-point occurrence between high-stakes and lower-stakes tests, providing insights into how test-taking behavior evolves during the tests. This approach offers a more nuanced understanding of test-taking dynamics, with important implications for test design, scoring, and interpretation.