Privacy-Preserved Automated Scoring using Federated Learning for Educational Research

📅 2025-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing privacy and ethical risks in automated scoring for educational research, this paper proposes a privacy-preserving automated scoring framework based on federated learning: student response data remains exclusively on local school devices, with only encrypted model parameters uploaded to a central server. We introduce an adaptive weighted averaging aggregation strategy that dynamically accommodates heterogeneous data distributions across institutions, thereby improving model convergence and cross-institutional robustness—overcoming the limitation of conventional anonymization methods, which still require access to raw data. Evaluated on real assessment data from nine secondary schools, our approach achieves scoring accuracy statistically comparable to centralized training (p = 0.051), while substantially reducing data collection and deployment overhead. This work establishes a compliant, scalable paradigm for AI-driven educational assessment.

Technology Category

Application Category

📝 Abstract
Data privacy remains a critical concern in educational research, necessitating Institutional Review Board (IRB) certification and stringent data handling protocols to ensure compliance with ethical standards. Traditional approaches rely on anonymization and controlled data-sharing mechanisms to facilitate research while mitigating privacy risks. However, these methods still involve direct access to raw student data, posing potential vulnerabilities and being time-consuming. This study proposes a federated learning (FL) framework for automatic scoring in educational assessments, eliminating the need to share raw data. Our approach leverages client-side model training, where student responses are processed locally on edge devices, and only optimized model parameters are shared with a central aggregation server. To effectively aggregate heterogeneous model updates, we introduce an adaptive weighted averaging strategy, which dynamically adjusts weight contributions based on client-specific learning characteristics. This method ensures robust model convergence while preserving privacy. We evaluate our framework using assessment data from nine middle schools, comparing the accuracy of federated learning-based scoring models with traditionally trained centralized models. A statistical significance test (paired t-test, $t(8) = 2.29, p = 0.051$) confirms that the accuracy difference between the two approaches is not statistically significant, demonstrating that federated learning achieves comparable performance while safeguarding student data. Furthermore, our method significantly reduces data collection, processing, and deployment overhead, accelerating the adoption of AI-driven educational assessments in a privacy-compliant manner.
Problem

Research questions and friction points this paper is trying to address.

Ensures data privacy in educational research using federated learning.
Eliminates raw data sharing in automated scoring systems.
Achieves comparable accuracy to centralized models while preserving privacy.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated learning eliminates raw data sharing.
Client-side model training preserves student privacy.
Adaptive weighted averaging ensures robust model convergence.
🔎 Similar Papers
No similar papers found.