MCLR: Improving Conditional Modeling in Visual Generative Models via Inter-Class Likelihood-Ratio Maximization and Establishing the Equivalence between Classifier-Free Guidance and Alignment Objectives

📅 2026-03-23

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Standard diffusion models rely on classifier-free guidance (CFG) during inference to generate high-quality conditional samples, indicating that their training objective lacks explicit modeling of inter-class discriminability. This work proposes the Maximum Conditional-to-Unconditional Likelihood Ratio (MCLR) alignment objective, which explicitly maximizes the ratio between conditional and unconditional likelihoods during training, enabling the model to achieve CFG-level generation quality under standard reverse sampling without requiring inference-time guidance. We provide the first theoretical proof that CFG corresponds to the optimal solution of a weighted MCLR objective, offering a mechanistic explanation for CFG. Experiments demonstrate that models fine-tuned with MCLR match CFG in both qualitative and quantitative metrics while eliminating the need for guidance during inference.

Technology Category

Application Category

📝 Abstract

Diffusion models have achieved state-of-the-art performance in generative modeling, but their success often relies heavily on classifier-free guidance (CFG), an inference-time heuristic that modifies the sampling trajectory. From a theoretical perspective, diffusion models trained with standard denoising score matching (DSM) are expected to recover the target data distribution, raising the question of why inference-time guidance is necessary in practice. In this work, we ask whether the DSM training objective can be modified in a principled manner such that standard reverse-time sampling, without inference-time guidance, yields effects comparable to CFG. We identify insufficient inter-class separation as a key limitation of standard diffusion models. To address this, we propose MCLR, a principled alignment objective that explicitly maximizes inter-class likelihood-ratios during training. Models fine-tuned with MCLR exhibit CFG-like improvements under standard sampling, achieving comparable qualitative and quantitative gains without requiring inference-time guidance. Beyond empirical benefits, we provide a theoretical result showing that the CFG-guided score is exactly the optimal solution to a weighted MCLR objective. This establishes a formal equivalence between classifier-free guidance and alignment-based objectives, offering a mechanistic interpretation of CFG.

Problem

Research questions and friction points this paper is trying to address.

diffusion models

classifier-free guidance

conditional modeling

inter-class separation

denoising score matching

Innovation

Methods, ideas, or system contributions that make the work stand out.

MCLR

classifier-free guidance

inter-class likelihood-ratio