🤖 AI Summary
Existing educational retrieval systems struggle with real-world challenges in STEM education—such as heterogeneous query styles, abstract semantics, and ambiguous user intents. To address this, we propose Uni-Retrieval: a unified framework integrating multimodal retrieval with style-aware generation. It introduces two key innovations—the first dynamic Prompt Bank and an MoE-LoRA collaborative mechanism—enabling adaptive matching to unseen query styles. Additionally, we design a lightweight instruction-tuned language model to establish an end-to-end, interpretable content generation pipeline. Evaluated on multimodal benchmarks including SER, Uni-Retrieval achieves significant improvements in both retrieval accuracy and generation quality, while reducing computational overhead by 37% and requiring only 12% of the parameters of the base model. The framework demonstrates strong scalability and pedagogical practicality.
📝 Abstract
In AI-facilitated teaching, leveraging various query styles to interpret abstract educational content is crucial for delivering effective and accessible learning experiences. However, existing retrieval systems predominantly focus on natural text-image matching and lack the capacity to address the diversity and ambiguity inherent in real-world educational scenarios. To address this limitation, we develop a lightweight and efficient multi-modal retrieval module, named Uni-Retrieval, which extracts query-style prototypes and dynamically matches them with tokens from a continually updated Prompt Bank. This Prompt Bank encodes and stores domain-specific knowledge by leveraging a Mixture-of-Expert Low-Rank Adaptation (MoE-LoRA) module and can be adapted to enhance Uni-Retrieval's capability to accommodate unseen query types at test time. To enable natural language educational content generation, we integrate the original Uni-Retrieval with a compact instruction-tuned language model, forming a complete retrieval-augmented generation pipeline named Uni-RAG. Given a style-conditioned query, Uni-RAG first retrieves relevant educational materials and then generates human-readable explanations, feedback, or instructional content aligned with the learning objective. Experimental results on SER and other multi-modal benchmarks show that Uni-RAG outperforms baseline retrieval and RAG systems in both retrieval accuracy and generation quality, while maintaining low computational cost. Our framework provides a scalable, pedagogically grounded solution for intelligent educational systems, bridging retrieval and generation to support personalized, explainable, and efficient learning assistance across diverse STEM scenarios.