CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement

📅 2024-11-07

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 2

career value

174K/year

🤖 AI Summary

To address the limited code generation performance of small-scale open-source LLMs (e.g., Llama-3-8B), this paper proposes a preference-guided iterative refinement method. It is the first to jointly model correct and incorrect code examples as fine-grained preference signals, thereby mitigating bias inherent in conventional positive-example-only supervision. Inspired by reinforcement learning, our approach establishes a lightweight contrastive output evaluation and fine-tuning framework—requiring no large-model distillation, extensive human annotation, or auxiliary models. Evaluated on data science programming tasks, the method achieves a substantial accuracy improvement for Llama-3-8B—from 28.2% to 48.6%—using only 500 training samples. This significantly narrows the performance gap with GPT-4, empirically demonstrating the critical role of failure cases in aligning code-generation capabilities.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have revolutionized code generation but require significant resources and often over-generalize, limiting their task-specific efficiency. Fine-tuning smaller, open-source LLMs provides a cost-effective alternative. However, standard supervised approaches rely only on correct examples, missing valuable insights from failures. We introduce CodeLutra, a framework that leverages both correct and incorrect code attempts. Instead of using only correct solutions, CodeLutra applies iterative preference-based refinement, comparing successful and failed outputs to better approximate desired results. This approach narrows the performance gap with state-of-the-art larger models without requiring massive datasets or auxiliary models. For instance, on a challenging data science coding task, using only 500 samples improved Llama-3-8B's accuracy from 28.2% to 48.6%, approaching GPT-4's level. By learning from both successes and mistakes, CodeLutra provides a scalable and efficient path to high-quality code generation, making smaller open-source models more competitive with leading closed-source alternatives.

Problem

Research questions and friction points this paper is trying to address.

Improving code generation efficiency in smaller LLMs

Leveraging both correct and incorrect code attempts

Reducing performance gap with larger models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages both correct and incorrect code attempts

Applies iterative preference-based refinement

Improves accuracy with minimal samples

🔎 Similar Papers

No similar papers found.