🤖 AI Summary
Large language models (LLMs) exhibit domain-knowledge gaps and limited multi-step reasoning capabilities in biomechanical engineering education. Method: This paper proposes a dual-module intelligent tutoring framework integrating Retrieval-Augmented Generation (RAG) and a Multi-Agent System (MAS). Built upon open-source LLMs (e.g., Qwen, Llama), RAG dynamically injects authoritative textbook and literature knowledge to enhance conceptual accuracy; MAS decomposes tasks across specialized agents for equation derivation, numerical computation, and solution explanation—ensuring traceable, verifiable multi-step reasoning. Results: RAG improves conceptual question accuracy by 23.6% and significantly enhances response stability; MAS achieves a 91.4% success rate on representative biomechanical computation tasks and autonomously generates interpretable, mixed natural-language-and-code solutions. This work provides a reusable methodology and empirical validation for designing domain-specific AI educational agents.
📝 Abstract
While large language models (LLMs) have demonstrated remarkable versatility across a wide range of general tasks, their effectiveness often diminishes in domain-specific applications due to inherent knowledge gaps. Moreover, their performance typically declines when addressing complex problems that require multi-step reasoning and analysis. In response to these challenges, we propose leveraging both LLMs and AI agents to develop education assistants aimed at enhancing undergraduate learning in biomechanics courses that focus on analyzing the force and moment in the musculoskeletal system of the human body. To achieve our goal, we construct a dual-module framework to enhance LLM performance in biomechanics educational tasks: 1) we apply Retrieval-Augmented Generation (RAG) to improve the specificity and logical consistency of LLM's responses to the conceptual true/false questions; 2) we build a Multi-Agent System (MAS) to solve calculation-oriented problems involving multi-step reasoning and code execution. Specifically, we evaluate the performance of several LLMs, i.e., Qwen-1.0-32B, Qwen-2.5-32B, and Llama-70B, on a biomechanics dataset comprising 100 true/false conceptual questions and problems requiring equation derivation and calculation. Our results demonstrate that RAG significantly enhances the performance and stability of LLMs in answering conceptual questions, surpassing those of vanilla models. On the other hand, the MAS constructed using multiple LLMs demonstrates its ability to perform multi-step reasoning, derive equations, execute code, and generate explainable solutions for tasks that require calculation. These findings demonstrate the potential of applying RAG and MAS to enhance LLM performance for specialized courses in engineering curricula, providing a promising direction for developing intelligent tutoring in engineering education.