A Unified Understanding of Offline Data Selection and Online Self-refining Generation for Post-training LLMs

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses the low data quality and poor adaptability arising from the decoupling of offline data selection and online generation in large language model (LLM) post-training. We propose a unified optimization framework to jointly govern both processes. Methodologically, we introduce a novel bilevel data selection mechanism: an upper level drives model self-adaptation via validation-set feedback, while a lower level jointly optimizes offline data weighting and online generation policies—supporting both implicit and explicit data quality modeling. We theoretically establish the framework’s convergence and effectiveness, enabling synergistic co-optimization of data selection and model iteration. Experiments on safety-aware fine-tuning demonstrate that our approach significantly outperforms the unfiltered, direct-mixing baseline, yielding measurable improvements in downstream task performance and data quality.

Technology Category

Application Category

📝 Abstract

Offline data selection and online self-refining generation, which enhance the data quality, are crucial steps in adapting large language models (LLMs) to specific downstream tasks. We tackle offline data selection and online self-refining generations through an optimization perspective. Specifically, bilevel data selection is used for offline data selection with respect to the validation dataset, and we treat online self-refining generation as a model adaptation step of selecting the model trained on current responses that best fits the validation data. Our framework offers a unified understanding of offline data selection and self-refining generation by assigning a learned data weight to each question and response, either explicitly or implicitly. For the first time, we theoretically demonstrate the effectiveness of the bilevel data selection framework and demonstrate its performance gains over unfiltered direct mixing baselines. By combining offline data with validation-weighted online generations, our method enhances fine-tuning performance. Experiments on quality enhancement and safety-aware LLM fine-tuning validate its effectiveness.

Problem

Research questions and friction points this paper is trying to address.

Optimizing offline data selection for LLM fine-tuning using bilevel selection

Treating online self-refining generation as model adaptation step

Unifying offline and online approaches through learned data weighting

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bilevel data selection optimizes offline dataset quality

Online self-refining generation adapts model via response selection

Unified framework assigns learned weights to data implicitly

🔎 Similar Papers

No similar papers found.