MM-ReCoder: Advancing Chart-to-Code Generation with Reinforcement Learning and Self-Correction

πŸ“… 2026-04-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing approaches to chart-to-code generation rely heavily on supervised fine-tuning and lack interaction with the execution environment, limiting their ability to effectively self-correct. This work proposes MM-ReCoder, a novel framework that integrates multimodal large language models with a two-stage reinforcement learning strategy based on GRPO. The first stage enhances the model’s capacity for multi-turn self-correction, while the second stage refines overall code generation quality, with both stages iteratively optimized using feedback from code execution. Evaluated on three mainstream chart-to-code benchmarks, MM-ReCoder achieves state-of-the-art performance, significantly improving the accuracy and executability of the generated code.
πŸ“ Abstract
Multimodal Large Language Models (MLLMs) have recently demonstrated promising capabilities in multimodal coding tasks such as chart-to-code generation. However, existing methods primarily rely on supervised fine-tuning (SFT), which requires the model to learn code patterns through chart-code pairs but does not expose the model to a code execution environment. Moreover, while self-correction through execution feedback offers a potential route to improve coding quality, even state-of-the-art MLLMs have been shown to struggle with effective self-correction. In this work, we introduce MM-ReCoder, a chart-to-code generation model trained with reinforcement learning (RL) and equipped with self-correction ability. We propose a two-stage multi-turn self-correction RL strategy based on Group Relative Policy Optimization (GRPO). The first stage enhances the model's self-correction ability via rolling out a shared first turn, while the second stage improves the coding capability with full-trajectory optimization. MM-ReCoder learns to produce more accurate and executable code through the interaction with the environment and by iteratively correcting its own outputs. Our results on three chart-to-code benchmarks demonstrate the state-of-the-art performance of MM-ReCoder.
Problem

Research questions and friction points this paper is trying to address.

chart-to-code generation
multimodal large language models
self-correction
code execution
reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

reinforcement learning
self-correction
chart-to-code generation
multimodal LLM
GRPO
πŸ”Ž Similar Papers
No similar papers found.