🤖 AI Summary
K-12 educational materials are inherently multimodal (text + images), yet existing models struggle to jointly model core tasks—including knowledge recommendation, knowledge tracing, response time prediction, and student answer prediction—within a unified framework. To address this, we propose the first full-stack multimodal unified assistant for K-12 education: it integrates language and vision encoders, incorporates a task-aware prompting mechanism, and employs lightweight adapter-based fine-tuning to replace multiple task-specific models with a single architecture. Our approach enables end-to-end joint modeling of all four educational tasks—the first such effort—achieving SOTA or near-SOTA performance across multiple benchmarks with strong generalization. Industrial deployment efficiency improves by ~300% over baseline methods, while maintaining performance comparable to full fine-tuning. The system has been successfully deployed in real-world learning environments, demonstrating practical utility and scalability.
📝 Abstract
Education materials for K-12 students often consist of multiple modalities, such as text and images, posing challenges for models to fully understand nuanced information in these materials. In this paper, we propose a unified language and vision assistant UniEDU designed for various educational applications, including knowledge recommendation, knowledge tracing, time cost prediction, and user answer prediction, all within a single model. Unlike conventional task-specific models, UniEDU offers a unified solution that excels across multiple educational tasks while maintaining strong generalization capabilities. Its adaptability makes it well-suited for real-world deployment in diverse learning environments. Furthermore, UniEDU is optimized for industry-scale deployment by significantly reducing computational overhead-achieving approximately a 300% increase in efficiency-while maintaining competitive performance with minimal degradation compared to fully fine-tuned models. This work represents a significant step toward creating versatile AI systems tailored to the evolving demands of education.