EcoScratch: Cost-Effective Multimodal Repair for Scratch Using Execution Feedback

📅 2026-03-31

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the challenge of repairing runtime errors in Scratch programs, whose correctness hinges on observable stage behaviors and is often compromised by timing issues, event ordering, or inter-sprite interactions—problems poorly handled by conventional repair techniques. The authors propose an adaptive repair pipeline that uniquely integrates multimodal evidence selection with joint decision-making on repair resource allocation, dynamically adjusting verification intensity and feedback mechanisms. By synergistically combining textual program repair, JSON Patch generation, .sb3 project reconstruction, video-level execution tracing, and reasoning with multimodal large language models, the approach achieves a state-of-the-art repair success rate of 30.3% on a benchmark of 100 executable projects, while substantially reducing computational overhead and energy consumption.

Technology Category

Application Category

📝 Abstract

Scratch is the most popular programming environment for novices, with over 1.15 billion projects created worldwide. Unlike traditional languages, correctness in Scratch is defined by visible behavior on the stage rather than by code structure alone, so programs that appear correct in the workspace can still fail at runtime due to timing, event ordering, or cross-sprite interactions. Visual execution evidence such as gameplay videos can therefore be essential for diagnosis and repair. However, capturing and processing this evidence inside an automated repair loop introduces substantial overhead. Probing execution, recording stage behavior, rebuilding executable .sb3 projects, and verifying candidate fixes consume time, monetary cost, and resources across an entire repair trajectory rather than a single model call. We present EcoScratch, a repair pipeline that uses lightweight runtime signals to decide whether the next attempt stays text-only or escalates to multimodal prompting. The controller also sets the JSON Patch budget and verification effort, so evidence choice and repair budget are coupled inside the same decision. EcoScratch rebuilds candidate fixes into executable .sb3 projects and records per-trajectory traces, monetary cost, local-runtime energy. We evaluate 12 models on 100 executable Scratch repair projects under four controller settings, yielding 4800 repair trajectories. In this matrix, a selective multimodal policy gives the strongest observed success-cost-energy tradeoff. It reaches the highest generation success (30.3%) while using less average cost and local-runtime energy than the two non-adaptive multimodal baselines under the same bounded trajectory budget; text-only remains the lowest-cost floor. Across the evaluated matrix, multimodal evidence helps most when it is used to control escalation within a bounded trajectory budget rather than applied uniformly.

Problem

Research questions and friction points this paper is trying to address.

Scratch repair

multimodal evidence

execution feedback

cost-effective

program correctness

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal repair

execution feedback

adaptive repair policy