🤖 AI Summary
In stepwise problem-solving, students often implicitly merge multiple steps, causing combinatorial explosion in solution-path enumeration; conventional rule-by-rule matching fails to reliably diagnose such errors. Method: This paper proposes an automated diagnostic approach that backtracks from the final answer to identify errors, avoiding explicit path enumeration. Instead, it leverages answer-consistency constraints and task-specific strategy guidance to drive rule matching and automatically complete partial solution paths, thereby localizing errors arising from implicit step merging. Contribution/Results: The method substantially mitigates combinatorial explosion and broadens the scope of detectable error types. Evaluated on 1,939 undiagnosed student solution steps, it achieves a 29.4% effective diagnosis rate. Against a gold-standard set of 115 teacher-annotated errors, it attains 97% agreement, demonstrating high accuracy and practical viability.
📝 Abstract
Many intelligent tutoring systems can support a student in solving a stepwise task. When a student combines several steps in one step, the number of possible paths connecting consecutive inputs may be very large. This combinatorial explosion makes error diagnosis hard. Using a final answer to diagnose a combination of steps can mitigate the combinatorial explosion, because there are generally fewer possible (erroneous) final answers than (erroneous) solution paths. An intermediate input for a task can be diagnosed by automatically completing it according to the task solution strategy and diagnosing this solution. This study explores the potential of automated error diagnosis based on a final answer. We investigate the design of a service that provides a buggy rule diagnosis when a student combines several steps. To validate the approach, we apply the service to an existing dataset (n=1939) of unique student steps when solving quadratic equations, which could not be diagnosed by a buggy rule service that tries to connect consecutive inputs with a single rule. Results show that final answer evaluation can diagnose 29,4% of these steps. Moreover, a comparison of the generated diagnoses with teacher diagnoses on a subset (n=115) shows that the diagnoses align in 97% of the cases. These results can be considered a basis for further exploration of the approach.