WarriorCoder: Learning from Expert Battles to Augment Code Large Language Models

📅 2024-12-23

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Code large language models (LLMs) critically depend on high-quality fine-tuning data, yet manual annotation is prohibitively expensive, and existing data flywheel approaches over-rely on proprietary LLMs (e.g., GPT-4, Claude) for data augmentation—introducing limited diversity and systemic biases. Method: We propose the “Expert Code LLM Arena”, a novel multi-model adversarial collaboration framework. It constructs high-quality, diverse training data from scratch via decentralized multi-agent competition, neutral dynamic adjudication, self-generated instruction distillation, and reinforcement feedback training—entirely without invoking any closed-source LLM. Contribution/Results: Our method achieves state-of-the-art performance on code understanding and generation benchmarks. Under comparable parameter counts, it significantly outperforms strong baselines including GPT-4o and Claude-3.5, demonstrating both scalability and robustness while eliminating dependency on proprietary models.

Technology Category

Application Category

📝 Abstract

Despite recent progress achieved by code large language models (LLMs), their remarkable abilities are largely dependent on fine-tuning on the high-quality data, posing challenges for data collection and annotation. To address this, current methods often design various data flywheels to collect complex code instructions, enabling models to handle more intricate tasks. However, these approaches typically rely on off-the-shelf datasets and data augmentation from a limited set of proprietary LLMs (e.g., Claude, GPT4, and so on), which restricts the diversity of the constructed data and makes it prone to systemic biases. In this paper, we propose WarriorCoder, a novel paradigm learns from expert battles to address these limitations. Specifically, we create an arena where leading expert code LLMs challenge each other, with evaluations conducted by impartial judges. This competitive framework generates novel training data from scratch, leveraging the strengths of all participants. Experimental results show that WarriorCoder achieves state-of-the-art performance compared to previous models of the same size, even without relying on proprietary LLMs.

Problem

Research questions and friction points this paper is trying to address.

Enhance code LLMs diversity

Reduce systemic data biases

Generate novel training data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Expert battles generate novel data

Competitive framework enhances model training

Impartial judges ensure data quality

🔎 Similar Papers

Self-playing Adversarial Language Game Enhances LLM Reasoning