PIKA: Expert-Level Synthetic Datasets for Post-Training Alignment from Scratch

📅 2025-10-08

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

Current LLM alignment heavily relies on large-scale human-annotated datasets, entailing high costs, poor reproducibility, and unclear scaling relationships between data volume and performance. To address this, we propose PiKa—a highly efficient synthetic data paradigm—that constructs the high-quality alignment dataset PiKa-SFT using only 30K samples, eliminating dependence on proprietary or human-labeled data. Methodologically, PiKa integrates AI-generated data synthesis, reinforcement learning from AI feedback (RLAIF), and supervised fine-tuning (SFT) within an iterative optimization framework. We perform zero-shot post-training alignment on base models from the Llama-3 and Qwen2.5 families. Experiments show that Llama-3-8B fine-tuned on PiKa-SFT surpasses the official Llama-3-8B-Instruct on AlpacaEval 2.0 and Arena-Hard; all Qwen2.5 variants exhibit consistent improvements. These results validate the efficacy and generalizability of small-scale, high-quality synthetic data, offering a scalable, low-cost alignment pathway for resource-constrained settings.

Technology Category

Application Category

📝 Abstract

Reinforcement Learning from Human Feedback (RLHF) has become a cornerstone for aligning large language models (LLMs). However, its effectiveness depends on high-quality instruction data. Most existing alignment datasets are either private or require costly human annotation, which limits reproducibility and scalability. Even with Reinforcement Learning from AI Feedback (RLAIF), concerns about data quality remain. Moreover, it is unclear how much data is actually required to fine-tune a base model into a strong instruction-following model. Current approaches often rely on over 300k examples even at the supervised fine-tuning (SFT) stage, yet they still underperform compared to proprietary models, creating barriers for academic and resource-limited communities. To address this gap, we introduce PiKa, a data-efficient family of expert-level alignment datasets. In particular, the PiKa-SFT dataset uses only 30k SFT examples, far fewer than state-of-the-art datasets like Magpie. Through evaluations by fine-tuning Llama-3-8B-Base on PiKa and other public datasets, we show that PiKa-SFT outperforms models trained on much larger data. On AlpacaEval 2.0 and Arena-Hard benchmarks, PiKa-SFT fine-tuning even surpasses the official Llama-3-8B-Instruct model trained on over 10 million proprietary examples. We further extend our study by training the Qwen2.5 series (0.5B to 7B) on PiKa-SFT, achieving consistent gains. These findings demonstrate that high-quality alignment can be achieved with significantly less data, offering a scalable path for open-source LLM alignment. Code and data: https://github.com/SJY8460/PiKa.

Problem

Research questions and friction points this paper is trying to address.

Addressing data scarcity in LLM alignment with expert synthetic datasets

Reducing reliance on costly human annotation for model training

Achieving superior performance using significantly fewer training examples

Innovation

Methods, ideas, or system contributions that make the work stand out.

PiKa uses small expert-level datasets for alignment

It trains models with only 30k SFT examples

Achieves superior performance with minimal data

🔎 Similar Papers

No similar papers found.