Discriminative Finetuning of Generative Large Language Models without Reward Models and Preference Data

📅 2025-02-25

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

How can large language models (LLMs) be better aligned without relying on preference data or reward models? This paper introduces Discriminative Fine-Tuning (DFT), which shifts from the conventional generative token-prediction paradigm to discriminative answer selection. DFT establishes the first probabilistic framework that explicitly models the discriminative likelihood of positive answers relative to the entire output space. Training is performed end-to-end via logits-level suppression of negative answers, obviating reinforcement learning and preference annotations entirely. Experiments demonstrate that DFT significantly outperforms supervised fine-tuning (SFT) across multiple benchmarks, matching or exceeding the performance of SFT→PO (preference-optimized) methods. Crucially, this work provides the first empirical validation of discriminative fine-tuning—without any preference data—as an effective, generalizable, and practical alignment strategy for LLMs.

Technology Category

Application Category

📝 Abstract

Supervised fine-tuning (SFT) followed by preference optimization (PO) denoted by SFT$ ightarrow$PO has become the standard for improving pretrained large language models (LLMs), with PO demonstrating significant performance gains. However, PO methods rely on either human-labeled preference data or a strong reward model to generate preference data. Can we fine-tune LLMs without preference data or reward models while achieving competitive performance to SFT$ ightarrow$PO? We address this question by introducing Discriminative Fine-Tuning (DFT), a novel approach that eliminates the need for preference data. Unlike SFT, which employs a generative approach and overlooks negative data, DFT adopts a discriminative paradigm that that increases the probability of positive answers while suppressing potentially negative ones, shifting from token prediction to data prediction. Our contributions include: (i) a discriminative probabilistic framework for fine-tuning LLMs by explicitly modeling the discriminative likelihood of an answer among all possible outputs given an input; (ii) efficient algorithms to optimize this discriminative likelihood; and (iii) extensive experiments demonstrating DFT's effectiveness, achieving performance better than SFT and comparable to if not better than SFT$ ightarrow$PO. The code can be found at https://github.com/PenGuln/DFT.

Problem

Research questions and friction points this paper is trying to address.

Fine-tune LLMs without preference data

Eliminate need for reward models

Achieve competitive performance to SFT→PO

Innovation

Methods, ideas, or system contributions that make the work stand out.

Discriminative Fine-Tuning eliminates preference data

DFT shifts from token to data prediction

DFT outperforms SFT, matches SFT-PO performance

🔎 Similar Papers

Unlocking Large Language Model's Planning Capabilities with Maximum Diversity Fine-tuning