Exploring AI-Enabled Test Practice, Affect, and Test Outcomes in Language Assessment

📅 2025-08-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the impact mechanism of generative AI–supported repetitive practice testing on high-stakes language assessment. Method: Leveraging large-scale authentic examination data and a computerized adaptive testing platform, we quantitatively analyze nonlinear relationships among practice frequency, test performance, examinee self-efficacy, and score-sharing behavior. Contribution/Results: We provide the first systematic empirical evidence of the washback effect of AI-generated automatic item generation in language assessment: 1–3 practice sessions significantly improve scores (+5.2%), boost self-efficacy (+18.7%), and increase willingness to share scores (+24.1%); beyond three sessions, diminishing returns emerge—followed by performance decline—suggesting that excessive practice may induce cognitive overload or strategic rigidity. These findings establish empirically grounded theoretical boundaries for designing optimal AI-mediated interventions in educational assessment.

Technology Category

Application Category

📝 Abstract
Practice tests for high-stakes assessment are intended to build test familiarity, and reduce construct-irrelevant variance which can interfere with valid score interpretation. Generative AI-driven, automated item generation (AIG) scales the creation of large item banks and multiple practice tests, enabling repeated practice opportunities. We conducted a large-scale observational study (N = 25,969) using the Duolingo English Test (DET) -- a digital, high-stakes, computer-adaptive English language proficiency test to examine how increased access to repeated test practice relates to official DETscores, test-taker affect (e.g., confidence), and score-sharing for university admissions. To our knowledge, this is the first large-scale study exploring the use of AIG-enabled practice tests in high-stakes language assessment. Results showed that taking 1-3 practice tests was associated with better performance (scores), positive affect (e.g., confidence) toward the official DET, and increased likelihood of sharing scores for university admissions for those who also expressed positive affect. Taking more than 3 practice tests was related to lower performance, potentially reflecting washback -- i.e., using the practice test for purposes other than test familiarity, such as language learning or developing test-taking strategies. Findings can inform best practices regarding AI-supported test readiness. Study findings also raise new questions about test-taker preparation behaviors and relationships to test-taker performance, affect, and behaviorial outcomes.
Problem

Research questions and friction points this paper is trying to address.

Investigating AI-generated practice tests' impact on language assessment outcomes
Examining relationships between repeated practice, test scores, and affect
Determining optimal practice test usage for high-stakes English proficiency testing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative AI-driven automated item generation
Large-scale observational study on practice tests
AI-supported test readiness best practices
🔎 Similar Papers
No similar papers found.
Jill Burstein
Jill Burstein
Duolingo
Natural Language ProcessingEducational TechnologyLanguage Assessment
R
Ramsey Cardwell
Duolingo
P
Ping-Ling Chuang
Duolingo
A
Allison Michalowski
Duolingo
S
Steven Nydick
Duolingo