NALA_MAINZ at BLP-2025 Task 2: A Multi-agent Approach for Bangla Instruction to Python Code Generation

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the end-to-end generation of Python code from Bangla-language instructions—a challenging low-resource language task. We propose a multi-agent collaborative framework that decouples code generation from debugging: only failing test cases trigger targeted error analysis and repair. The framework comprises two LLM-driven agents—a generation agent and a debugging agent—integrated with pytest-style assertion-based testing, stack-trace parsing, and conditional re-generation. This design significantly improves both repair efficiency and functional correctness. Evaluated on the BLP-2025 shared task, our approach achieves 95.4% Pass@1, ranking first and demonstrating strong effectiveness and robustness for code generation in low-resource linguistic settings. All source code is publicly released.

Technology Category

Application Category

📝 Abstract
This paper presents JGU Mainz's winning system for the BLP-2025 Shared Task on Code Generation from Bangla Instructions. We propose a multi-agent-based pipeline. First, a code-generation agent produces an initial solution from the input instruction. The candidate program is then executed against the provided unit tests (pytest-style, assert-based). Only the failing cases are forwarded to a debugger agent, which reruns the tests, extracts error traces, and, conditioning on the error messages, the current program, and the relevant test cases, generates a revised solution. Using this approach, our submission achieved first place in the shared task with a $Pass@1$ score of 95.4. We also make our code public.
Problem

Research questions and friction points this paper is trying to address.

Generating Python code from Bangla natural language instructions
Debugging failing code using error traces and test cases
Improving code accuracy through multi-agent iterative refinement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent pipeline for code generation
Debugger agent fixes failing test cases
Conditional revision using error messages