From Query to Explanation: Uni-RAG for Multi-Modal Retrieval-Augmented Learning in STEM

📅 2025-07-04

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Existing educational retrieval systems struggle with real-world challenges in STEM education—such as heterogeneous query styles, abstract semantics, and ambiguous user intents. To address this, we propose Uni-Retrieval: a unified framework integrating multimodal retrieval with style-aware generation. It introduces two key innovations—the first dynamic Prompt Bank and an MoE-LoRA collaborative mechanism—enabling adaptive matching to unseen query styles. Additionally, we design a lightweight instruction-tuned language model to establish an end-to-end, interpretable content generation pipeline. Evaluated on multimodal benchmarks including SER, Uni-Retrieval achieves significant improvements in both retrieval accuracy and generation quality, while reducing computational overhead by 37% and requiring only 12% of the parameters of the base model. The framework demonstrates strong scalability and pedagogical practicality.

Technology Category

Application Category

📝 Abstract

In AI-facilitated teaching, leveraging various query styles to interpret abstract educational content is crucial for delivering effective and accessible learning experiences. However, existing retrieval systems predominantly focus on natural text-image matching and lack the capacity to address the diversity and ambiguity inherent in real-world educational scenarios. To address this limitation, we develop a lightweight and efficient multi-modal retrieval module, named Uni-Retrieval, which extracts query-style prototypes and dynamically matches them with tokens from a continually updated Prompt Bank. This Prompt Bank encodes and stores domain-specific knowledge by leveraging a Mixture-of-Expert Low-Rank Adaptation (MoE-LoRA) module and can be adapted to enhance Uni-Retrieval's capability to accommodate unseen query types at test time. To enable natural language educational content generation, we integrate the original Uni-Retrieval with a compact instruction-tuned language model, forming a complete retrieval-augmented generation pipeline named Uni-RAG. Given a style-conditioned query, Uni-RAG first retrieves relevant educational materials and then generates human-readable explanations, feedback, or instructional content aligned with the learning objective. Experimental results on SER and other multi-modal benchmarks show that Uni-RAG outperforms baseline retrieval and RAG systems in both retrieval accuracy and generation quality, while maintaining low computational cost. Our framework provides a scalable, pedagogically grounded solution for intelligent educational systems, bridging retrieval and generation to support personalized, explainable, and efficient learning assistance across diverse STEM scenarios.

Problem

Research questions and friction points this paper is trying to address.

Addressing diversity and ambiguity in educational query styles

Enhancing retrieval accuracy for multi-modal educational content

Generating human-readable explanations from retrieved materials

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight multi-modal retrieval module

Mixture-of-Expert Low-Rank Adaptation

Retrieval-augmented generation pipeline

🔎 Similar Papers

UniRAG: Universal Retrieval Augmentation for Large Vision Language Models