🤖 AI Summary
In real-world educational settings, automated scoring systems typically require task-specific model training, incurring high computational, storage, and maintenance overhead. To address this, we propose a knowledge distillation–based Mixture-of-Experts (MoE) framework for multi-task scoring. It employs a shared encoder, a learnable gating mechanism, and lightweight task-specific heads to jointly capture both task-invariant and task-specific representations. Expert modules encapsulate reusable scoring competencies, substantially improving cross-task generalization and enabling zero-shot or few-shot adaptation to new tasks. Evaluated on nine scientific reasoning tasks, our unified model matches the performance of dedicated single-task baselines while reducing model size by 6× compared to an ensemble of independent models and achieving 87× compression relative to a 20B-parameter teacher model. This yields significant gains in efficiency and deployment feasibility.
📝 Abstract
Automated scoring of written constructed responses typically relies on separate models per task, straining computational resources, storage, and maintenance in real-world education settings. We propose UniMoE-Guided, a knowledge-distilled multi-task Mixture-of-Experts (MoE) approach that transfers expertise from multiple task-specific large models (teachers) into a single compact, deployable model (student). The student combines (i) a shared encoder for cross-task representations, (ii) a gated MoE block that balances shared and task-specific processing, and (iii) lightweight task heads. Trained with both ground-truth labels and teacher guidance, the student matches strong task-specific models while being far more efficient to train, store, and deploy. Beyond efficiency, the MoE layer improves transfer and generalization: experts develop reusable skills that boost cross-task performance and enable rapid adaptation to new tasks with minimal additions and tuning. On nine NGSS-aligned science-reasoning tasks (seven for training/evaluation and two held out for adaptation), UniMoE-Guided attains performance comparable to per-task models while using $sim$6$ imes$ less storage than maintaining separate students, and $87 imes$ less than the 20B-parameter teacher. The method offers a practical path toward scalable, reliable, and resource-efficient automated scoring for classroom and large-scale assessment systems.