🤖 AI Summary
Addressing privacy and ethical risks in automated scoring for educational research, this paper proposes a privacy-preserving automated scoring framework based on federated learning: student response data remains exclusively on local school devices, with only encrypted model parameters uploaded to a central server. We introduce an adaptive weighted averaging aggregation strategy that dynamically accommodates heterogeneous data distributions across institutions, thereby improving model convergence and cross-institutional robustness—overcoming the limitation of conventional anonymization methods, which still require access to raw data. Evaluated on real assessment data from nine secondary schools, our approach achieves scoring accuracy statistically comparable to centralized training (p = 0.051), while substantially reducing data collection and deployment overhead. This work establishes a compliant, scalable paradigm for AI-driven educational assessment.
📝 Abstract
Data privacy remains a critical concern in educational research, necessitating Institutional Review Board (IRB) certification and stringent data handling protocols to ensure compliance with ethical standards. Traditional approaches rely on anonymization and controlled data-sharing mechanisms to facilitate research while mitigating privacy risks. However, these methods still involve direct access to raw student data, posing potential vulnerabilities and being time-consuming. This study proposes a federated learning (FL) framework for automatic scoring in educational assessments, eliminating the need to share raw data. Our approach leverages client-side model training, where student responses are processed locally on edge devices, and only optimized model parameters are shared with a central aggregation server. To effectively aggregate heterogeneous model updates, we introduce an adaptive weighted averaging strategy, which dynamically adjusts weight contributions based on client-specific learning characteristics. This method ensures robust model convergence while preserving privacy. We evaluate our framework using assessment data from nine middle schools, comparing the accuracy of federated learning-based scoring models with traditionally trained centralized models. A statistical significance test (paired t-test, $t(8) = 2.29, p = 0.051$) confirms that the accuracy difference between the two approaches is not statistically significant, demonstrating that federated learning achieves comparable performance while safeguarding student data. Furthermore, our method significantly reduces data collection, processing, and deployment overhead, accelerating the adoption of AI-driven educational assessments in a privacy-compliant manner.