Buggy rule diagnosis for combined steps through final answer evaluation in stepwise tasks

📅 2025-07-18

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

In stepwise problem-solving, students often implicitly merge multiple steps, causing combinatorial explosion in solution-path enumeration; conventional rule-by-rule matching fails to reliably diagnose such errors. Method: This paper proposes an automated diagnostic approach that backtracks from the final answer to identify errors, avoiding explicit path enumeration. Instead, it leverages answer-consistency constraints and task-specific strategy guidance to drive rule matching and automatically complete partial solution paths, thereby localizing errors arising from implicit step merging. Contribution/Results: The method substantially mitigates combinatorial explosion and broadens the scope of detectable error types. Evaluated on 1,939 undiagnosed student solution steps, it achieves a 29.4% effective diagnosis rate. Against a gold-standard set of 115 teacher-annotated errors, it attains 97% agreement, demonstrating high accuracy and practical viability.

Technology Category

Application Category

📝 Abstract

Many intelligent tutoring systems can support a student in solving a stepwise task. When a student combines several steps in one step, the number of possible paths connecting consecutive inputs may be very large. This combinatorial explosion makes error diagnosis hard. Using a final answer to diagnose a combination of steps can mitigate the combinatorial explosion, because there are generally fewer possible (erroneous) final answers than (erroneous) solution paths. An intermediate input for a task can be diagnosed by automatically completing it according to the task solution strategy and diagnosing this solution. This study explores the potential of automated error diagnosis based on a final answer. We investigate the design of a service that provides a buggy rule diagnosis when a student combines several steps. To validate the approach, we apply the service to an existing dataset (n=1939) of unique student steps when solving quadratic equations, which could not be diagnosed by a buggy rule service that tries to connect consecutive inputs with a single rule. Results show that final answer evaluation can diagnose 29,4% of these steps. Moreover, a comparison of the generated diagnoses with teacher diagnoses on a subset (n=115) shows that the diagnoses align in 97% of the cases. These results can be considered a basis for further exploration of the approach.

Problem

Research questions and friction points this paper is trying to address.

Diagnosing errors in combined steps using final answers

Mitigating combinatorial explosion in stepwise task diagnosis

Validating buggy rule diagnosis for quadratic equation solving

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses final answer evaluation for error diagnosis

Automatically completes intermediate inputs for diagnosis

Validates approach with quadratic equations dataset

🔎 Similar Papers

Long-context Language Models Cannot Retrieve Without Sufficient Steps