๐ค AI Summary
This work addresses the lack of domain knowledge integration and end-to-end cohesion in current automated educational data mining pipelines. To bridge this gap, we propose a multi-agent system tailored for educational research, wherein five specialized large language model (LLM) agents collaboratively orchestrate the entire scientific workflowโfrom problem formulation and data analysis to manuscript generation. The framework innovatively embeds educational domain knowledge through a state-machine coordinator, a three-tier educational data registry, and a structured agent communication protocol. It further supports iterative revision cycles, checkpoint-based recovery, and sandboxed execution for reliability. The system autonomously produces LaTeX-formatted academic papers with verifiable machine learning analyses and authentic citations, and has been open-sourced to empower the educational research community.
๐ Abstract
In this technical report, we present the Educational Data Mining Automated Research System (EDM-ARS), a domain-specific multi-agent pipeline that automates end-to-end educational data mining (EDM) research. We conceptualize EDM-ARS as a general framework for domain-aware automated research pipelines, where educational expertise is embedded into each stage of the research lifecycle. As a first instantiation of this framework, we focus on predictive modeling tasks. Within this scope, EDM-ARS orchestrates five specialized LLM-powered agents (ProblemFormulator, DataEngineer, Analyst, Critic, and Writer) through a state-machine coordinator that supports revision loops, checkpoint-based recovery, and sandboxed code execution. Given a research prompt and a dataset, EDM-ARS produces a complete LaTeX manuscript with real Semantic Scholar citations, validated machine learning analyses, and automated methodological peer review. We also provide a detailed description of the system architecture, the three-tier data registry design that encodes educational domain expertise, the specification of each agent, the inter-agent communication protocol, and mechanisms for error-handling and self-correction. Finally, we discuss current limitations, including single-dataset scope and formulaic paper output, and outline a phased roadmap toward causal inference, transfer learning, psychometric, and multi-dataset generalization. EDM-ARS is released as an open-source project to support the educational research community.