🤖 AI Summary
Addressing the challenges of evaluation difficulty and poor interpretability in LLM-based question-answering systems for education, this paper proposes a modular evaluation framework enabling fine-grained performance analysis of three core components: function calling, information retrieval, and response generation. Methodologically, it innovatively integrates structure-aware retrieval, modular function-call mechanisms, vector-based retrieval, and an LLM-based scoring model to support component-level failure diagnosis and transparent execution tracing. Experiments conducted across multiple large language models on educational QA tasks demonstrate that the framework accurately pinpoints performance bottlenecks at each stage, significantly enhancing system interpretability (a +38.2% improvement in human comprehensibility scores) and pedagogical consistency (91.5% agreement rate among domain experts). These results substantiate that modular design confers tangible benefits in adaptability and controllability for AI systems deployed in educational contexts.
📝 Abstract
With the growing use of Large Language Model (LLM)-based Question-Answering (QA) systems in education, it is critical to evaluate their performance across individual pipeline components. In this work, we introduce {model}, a modular function-calling LLM pipeline, and present a comprehensive evaluation along three key axes: function calling strategies, retrieval methods, and generative language models. Our framework enables fine-grained analysis by isolating and assessing each component. We benchmark function-calling performance across LLMs, compare our novel structure-aware retrieval method to vector-based and LLM-scoring baselines, and evaluate various LLMs for response synthesis. This modular approach reveals specific failure modes and performance patterns, supporting the development of interpretable and effective educational QA systems. Our findings demonstrate the value of modular function calling in improving system transparency and pedagogical alignment. Website and Supplementary Material: https://chancharikmitra.github.io/EduMod-LLM-website/