WizardLM: Empowering Large Pre-Trained Language Models to Follow Complex Instructions

📅 2023-04-24
🏛️ International Conference on Learning Representations
📈 Citations: 932
Influential: 140
📄 PDF
🤖 AI Summary
High-quality, high-complexity instruction data are costly and labor-intensive to construct manually, limiting scale and diversity. To address this, we propose Evol-Instruct—a novel framework that leverages large language models (LLMs) to iteratively evolve initial instructions through multiple rounds of automated refinement, yielding instruction data with graded complexity and semantic diversity. Using this data, we fine-tune LLaMA to develop WizardLM, an open-source instruction-tuned model. This work pioneers the AI-driven self-evolution paradigm for instruction generation and provides the first empirical evidence that AI-generated instructions significantly outperform human-authored ones on complex tasks. Human evaluation shows WizardLM achieves higher preference rates than ChatGPT on complex instruction-following tasks; GPT-4-based automated evaluation indicates it reaches ≥90% of ChatGPT’s performance on 17 out of 29 evaluated capabilities. All code, data, and models are publicly released.
📝 Abstract
Training large language models (LLMs) with open-domain instruction following data brings colossal success. However, manually creating such instruction data is very time-consuming and labor-intensive. Moreover, humans may struggle to produce high-complexity instructions. In this paper, we show an avenue for creating large amounts of instruction data with varying levels of complexity using LLM instead of humans. Starting with an initial set of instructions, we use our proposed Evol-Instruct to rewrite them step by step into more complex instructions. Then, we mix all generated instruction data to fine-tune LLaMA. We call the resulting model WizardLM. Human evaluations on a complexity-balanced test bed and Vicuna's testset show that instructions from Evol-Instruct are superior to human-created ones. By analyzing the human evaluation results of the high complexity part, we demonstrate that outputs from our WizardLM are preferred to outputs from OpenAI ChatGPT. In GPT-4 automatic evaluation, WizardLM achieves more than 90% capacity of ChatGPT on 17 out of 29 skills. Even though WizardLM still lags behind ChatGPT in some aspects, our findings suggest that fine-tuning with AI-evolved instructions is a promising direction for enhancing LLMs. Our code and data are public at https://github.com/nlpxucan/WizardLM
Problem

Research questions and friction points this paper is trying to address.

Automating creation of complex instruction data for LLMs
Enhancing LLMs via AI-evolved instructions over human-written ones
Improving instruction-following capability in models like LLaMA
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLM to generate complex instruction data
Employs Evol-Instruct for stepwise complexity increase
Fine-tunes LLaMA with AI-evolved mixed instructions
🔎 Similar Papers
No similar papers found.