Review-Instruct: A Review-Driven Multi-Turn Conversations Generation Method for Large Language Models

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the insufficient contextual coherence of large language models (LLMs) in multi-turn dialogues caused by single-round supervised fine-tuning, this paper proposes a referee-driven, three-role (Candidate, Multiple Referees, Chair) iterative dialogue generation framework. It introduces the first multi-agent collaborative refereeing mechanism, where dynamic feedback from multiple referees enables controllable enhancement of instruction diversity and difficulty; instruction rewriting and iterative reinforcement are integrated to construct a high-quality multi-turn dialogue dataset (built upon Alpaca + LLaMA2-13B). Experiments demonstrate improvements of +2.0% on MT-Bench and +2.9% on MMLU-Pro over same-scale baselines. Ablation studies confirm the critical contributions of both the multi-referee design and the refereeing phase to overall performance gains.

Technology Category

Application Category

📝 Abstract
The effectiveness of large language models (LLMs) in conversational AI is hindered by their reliance on single-turn supervised fine-tuning (SFT) data, which limits contextual coherence in multi-turn dialogues. Existing methods for generating multi-turn dialogue data struggle to ensure both diversity and quality in instructions. To address this, we propose Review-Instruct, a novel framework that synthesizes multi-turn conversations through an iterative"Ask-Respond-Review"process involving three agent roles: a Candidate, multiple Reviewers, and a Chairman. The framework iteratively refines instructions by incorporating Reviewer feedback, enhancing dialogue diversity and difficulty. We construct a multi-turn dataset using the Alpaca dataset and fine-tune the LLaMA2-13B model. Evaluations on MT-Bench, MMLU-Pro, and Auto-Arena demonstrate significant improvements, achieving absolute gains of 2.9% on MMLU-Pro and 2% on MT-Bench compared to prior state-of-the-art models based on LLaMA2-13B. Ablation studies confirm the critical role of the Review stage and the use of multiple Reviewers in boosting instruction diversity and difficulty. Our work highlights the potential of review-driven, multi-agent frameworks for generating high-quality conversational data at scale.
Problem

Research questions and friction points this paper is trying to address.

Enhancing multi-turn dialogue coherence in LLMs
Improving diversity and quality of dialogue instructions
Generating high-quality conversational data at scale
Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative Ask-Respond-Review process enhances dialogues
Multi-agent framework with Candidate, Reviewers, Chairman
Review-Instruct boosts instruction diversity and difficulty
🔎 Similar Papers
Jiangxu Wu
Jiangxu Wu
Unknown affiliation
AI
C
Cong Wang
OPPO AI Center
T
Tianhuang Su
OPPO AI Center
J
Jun Yang
OPPO AI Center
H
Haozhi Lin
OPPO AI Center
C
Chao Zhang
OPPO AI Center
Ming Peng
Ming Peng
OPPO AI Center
Kai Shi
Kai Shi
Microsoft
Fiber OpticsSemiconductor LasersOptical Communication Systems
S
SongPan Yang
OPPO AI Center
B
BinQing Pan
OPPO AI Center
Z
ZiXian Li
OPPO AI Center
N
Ni Yang
Z
ZhenYu Yang