DAST: Difficulty-Aware Self-Training on Large Language Models

📅 2025-03-12

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

Existing LLM self-training methods suffer from insufficient sampling of difficult queries, leading to inadequate learning on complex tasks. To address this, we propose DAST, a difficulty-aware self-training framework that introduces a novel sampling-based dynamic difficulty estimation mechanism. DAST integrates difficulty-weighted data augmentation with synergistic supervised fine-tuning (SFT) and direct preference optimization (DPO), overcoming the limitations of uniform sampling in conventional approaches. By precisely identifying challenging samples and reinforcing both their generation and preference learning, DAST significantly improves model performance and generalization on demanding tasks such as mathematical reasoning. Extensive experiments demonstrate consistent and substantial gains across multiple benchmarks, validating the critical benefit of difficulty-aware strategies in LLM self-training. Our work establishes a new paradigm for efficient and robust autonomous model evolution.

Technology Category

Application Category

📝 Abstract

Present Large Language Models (LLM) self-training methods always under-sample on challenging queries, leading to inadequate learning on difficult problems which limits LLMs' ability. Therefore, this work proposes a difficulty-aware self-training (DAST) framework that focuses on improving both the quantity and quality of self-generated responses on challenging queries during self-training. DAST is specified in three components: 1) sampling-based difficulty level estimation, 2) difficulty-aware data augmentation, and 3) the self-training algorithm using SFT and DPO respectively. Experiments on mathematical tasks demonstrate the effectiveness and generalization of DAST, highlighting the critical role of difficulty-aware strategies in advancing LLM self-training.

Problem

Research questions and friction points this paper is trying to address.

Addresses under-sampling of challenging queries in LLM self-training.

Proposes DAST to enhance response quality and quantity on difficult problems.

Demonstrates DAST's effectiveness in improving LLM performance on mathematical tasks.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Difficulty-aware self-training framework for LLMs

Sampling-based difficulty level estimation method

Self-training algorithm using SFT and DPO

🔎 Similar Papers

No similar papers found.