🤖 AI Summary
This study addresses the end-to-end generation of Python code from Bangla-language instructions—a challenging low-resource language task. We propose a multi-agent collaborative framework that decouples code generation from debugging: only failing test cases trigger targeted error analysis and repair. The framework comprises two LLM-driven agents—a generation agent and a debugging agent—integrated with pytest-style assertion-based testing, stack-trace parsing, and conditional re-generation. This design significantly improves both repair efficiency and functional correctness. Evaluated on the BLP-2025 shared task, our approach achieves 95.4% Pass@1, ranking first and demonstrating strong effectiveness and robustness for code generation in low-resource linguistic settings. All source code is publicly released.
📝 Abstract
This paper presents JGU Mainz's winning system for the BLP-2025 Shared Task on Code Generation from Bangla Instructions. We propose a multi-agent-based pipeline. First, a code-generation agent produces an initial solution from the input instruction. The candidate program is then executed against the provided unit tests (pytest-style, assert-based). Only the failing cases are forwarded to a debugger agent, which reruns the tests, extracts error traces, and, conditioning on the error messages, the current program, and the relevant test cases, generates a revised solution. Using this approach, our submission achieved first place in the shared task with a $Pass@1$ score of 95.4. We also make our code public.