Conversational Education at Scale: A Multi-LLM Agent Workflow for Procedural Learning and Pedagogic Quality Assessment

📅 2025-07-07

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Existing large language models (LLMs) exhibit limited scalability and adaptability to large-scale, heterogeneous curricular content in educational applications, and lack systematic frameworks for pedagogical quality assessment. Method: We propose the first multi-LLM agent dialogue framework tailored for procedural knowledge instruction, comprising coordinated Teacher, Learner, Interaction Manager, and Evaluator agents—integrated via prompt engineering, role-based simulation, and workflow control. Contribution/Results: We construct a large-scale instructional dataset spanning 17 disciplines, 727 topics, and over 110,000 dialogues, and design a three-dimensional evaluation protocol combining computational metrics, structured rubrics, and human assessment. Experiments demonstrate significant improvements in cross-disciplinary teaching effectiveness, interaction quality, and interpretability. All data and code are publicly released to advance AI4Education research.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have advanced virtual educators and learners, bridging NLP with AI4Education. Existing work often lacks scalability and fails to leverage diverse, large-scale course content, with limited frameworks for assessing pedagogic quality. To this end, we propose WikiHowAgent, a multi-agent workflow leveraging LLMs to simulate interactive teaching-learning conversations. It integrates teacher and learner agents, an interaction manager, and an evaluator to facilitate procedural learning and assess pedagogic quality. We introduce a dataset of 114,296 teacher-learner conversations grounded in 14,287 tutorials across 17 domains and 727 topics. Our evaluation protocol combines computational and rubric-based metrics with human judgment alignment. Results demonstrate the workflow's effectiveness in diverse setups, offering insights into LLM capabilities across domains. Our datasets and implementations are fully open-sourced.

Problem

Research questions and friction points this paper is trying to address.

Scalable conversational education using multi-LLM agents

Assessing pedagogic quality in AI-driven learning systems

Leveraging diverse large-scale content for procedural learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent workflow for teaching-learning conversations

Integrates teacher, learner, and evaluator agents

Large-scale dataset with 114K educational conversations

🔎 Similar Papers

No similar papers found.