Beyond Factual QA: Mentorship-Oriented Question Answering over Long-Form Multilingual Content

📅 2026-01-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of current question-answering systems, which prioritize factual correctness but fall short in educational and career guidance contexts that require reflective and pedagogically supportive responses. To bridge this gap, the authors propose a novel “mentor-style” QA paradigm and introduce MentorQA, the first multilingual long-video QA benchmark comprising 9,000 question-answer pairs across four languages. Beyond factual accuracy, they define new evaluation dimensions—clarity, coherence, and learning value—to better capture pedagogical quality. Through systematic comparisons of Single-Agent, Dual-Agent, RAG, and Multi-Agent architectures, the study demonstrates that multi-agent approaches significantly outperform others on complex topics and low-resource languages. Furthermore, the research reveals a notable discrepancy between current LLM-based automatic evaluations and human judgments, highlighting the need for more nuanced assessment frameworks.

Technology Category

Application Category

📝 Abstract
Question answering systems are typically evaluated on factual correctness, yet many real-world applications-such as education and career guidance-require mentorship: responses that provide reflection and guidance. Existing QA benchmarks rarely capture this distinction, particularly in multilingual and long-form settings. We introduce MentorQA, the first multilingual dataset and evaluation framework for mentorship-focused question answering from long-form videos, comprising nearly 9,000 QA pairs from 180 hours of content across four languages. We define mentorship-focused evaluation dimensions that go beyond factual accuracy, capturing clarity, alignment, and learning value. Using MentorQA, we compare Single-Agent, Dual-Agent, RAG, and Multi-Agent QA architectures under controlled conditions. Multi-Agent pipelines consistently produce higher-quality mentorship responses, with especially strong gains for complex topics and lower-resource languages. We further analyze the reliability of automated LLM-based evaluation, observing substantial variation in alignment with human judgments. Overall, this work establishes mentorship-focused QA as a distinct research problem and provides a multilingual benchmark for studying agentic architectures and evaluation design in educational AI. The dataset and evaluation framework are released at https://github.com/AIM-SCU/MentorQA.
Problem

Research questions and friction points this paper is trying to address.

mentorship-oriented QA
long-form content
multilingual QA
educational AI
question answering evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

mentorship-oriented QA
multilingual long-form QA
multi-agent architecture
beyond factual accuracy
LLM-based evaluation
🔎 Similar Papers
No similar papers found.
P
Parth Bhalerao
Santa Clara University - Santa Clara, USA
D
Diola Dsouza
Santa Clara University - Santa Clara, USA
R
Ruiwen Guan
Santa Clara University - Santa Clara, USA
Oana Ignat
Oana Ignat
Assistant Professor of Computer Science at Santa Clara University
AIMachine LearningComputer VisionNatural Language ProcessingMathematics