Beyond Sample-Level Feedback: Using Reference-Level Feedback to Guide Data Synthesis

📅 2025-02-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Consistent synthetic data quality remains a critical challenge in constructing instruction-tuning datasets. Existing approaches rely on sample-level feedback, which is costly and exhibits poor generalizability. This paper introduces Reference-level Feedback—a novel mechanism that leverages high-quality seed samples as unified reference sources to extract shared ideal features and model feedback signals, enabling cross-sample feedback reuse and feature transfer. For the first time, it elevates feedback granularity from the sample level to the reference level, reducing feedback acquisition volume by over 50% while improving quality controllability. Supervised fine-tuning on the REFED dataset yields state-of-the-art performance among same-scale SFT models on AlpacaEval 2.0 and strong results on Arena-Hard, demonstrating both effectiveness and generalization capability.

Technology Category

Application Category

📝 Abstract

LLMs demonstrate remarkable capabilities in following natural language instructions, largely due to instruction-tuning on high-quality datasets. While synthetic data generation has emerged as a scalable approach for creating such datasets, maintaining consistent quality standards remains challenging. Recent approaches incorporate feedback to improve data quality, but typically operate at the sample level, generating and applying feedback for each response individually. In this work, we propose Reference-Level Feedback, a novel methodology that instead collects feedback based on high-quality reference samples from carefully curated seed data. We use this feedback to capture rich signals of desirable characteristics that can be propagated to newly synthesized data. We present REFED, a dataset of 10K instruction-response pairs synthesized using such feedback. We demonstrate the effectiveness of our approach by showing that Llama-3.1-8B-Instruct finetuned on REFED achieves state-of-the-art performance among similar-sized SFT-based models on AlpacaEval 2.0 and strong results on Arena-Hard. Through extensive experiments, we show that our approach consistently outperforms traditional sample-level feedback methods with significantly fewer feedback collections and improves performance across different model architectures.

Problem

Research questions and friction points this paper is trying to address.

Improving synthetic data quality

Using reference-level feedback

Enhancing model performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reference-Level Feedback methodology

Propagates rich signals to data

Enhances model performance significantly

🔎 Similar Papers

No similar papers found.

Authors to Follow