Beyond Binary Preference: Aligning Diffusion Models to Fine-grained Criteria by Decoupling Attributes

📅 2026-01-07
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of existing diffusion model alignment methods, which rely on binary preferences or scalar rewards and thus fail to capture the nuanced, hierarchical judgments of human experts regarding image quality. To overcome this, the study introduces non-binary, fine-grained human preferences structured as a multi-attribute evaluation framework annotated by domain experts. The authors propose a two-stage alignment framework: first, domain knowledge is injected into an auxiliary diffusion model via supervised fine-tuning; second, a novel Complex Preference Optimization (CPO) algorithm is designed to simultaneously enhance desirable attributes and suppress undesirable ones in the target model. Experiments on artistic image generation demonstrate that the proposed approach significantly improves alignment between generated outputs and expert standards, validating the effectiveness and scalability of fine-grained preference alignment.

Technology Category

Application Category

📝 Abstract
Post-training alignment of diffusion models relies on simplified signals, such as scalar rewards or binary preferences. This limits alignment with complex human expertise, which is hierarchical and fine-grained. To address this, we first construct a hierarchical, fine-grained evaluation criteria with domain experts, which decomposes image quality into multiple positive and negative attributes organized in a tree structure. Building on this, we propose a two-stage alignment framework. First, we inject domain knowledge to an auxiliary diffusion model via Supervised Fine-Tuning. Second, we introduce Complex Preference Optimization (CPO) that extends DPO to align the target diffusion to our non-binary, hierarchical criteria. Specifically, we reformulate the alignment problem to simultaneously maximize the probability of positive attributes while minimizing the probability of negative attributes with the auxiliary diffusion. We instantiate our approach in the domain of painting generation and conduct CPO training with an annotated dataset of painting with fine-grained attributes based on our criteria. Extensive experiments demonstrate that CPO significantly enhances generation quality and alignment with expertise, opening new avenues for fine-grained criteria alignment.
Problem

Research questions and friction points this paper is trying to address.

diffusion models
fine-grained alignment
non-binary preference
hierarchical criteria
post-training alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Complex Preference Optimization
Fine-grained Alignment
Hierarchical Evaluation Criteria
Diffusion Model Alignment
Supervised Fine-Tuning
🔎 Similar Papers
No similar papers found.
C
Chen Meng
Zhejiang University
Zejian Li
Zejian Li
ICTP
Z
Zhongni Liu
University of Electronic Science and Technology of China
Y
Yize Li
Zhejiang University
C
Changle Xie
Zhejiang University
K
Kaixin Jia
Zhejiang University
Ling Yang
Ling Yang
Postdoc@Princeton University, PhD@Peking University
LLMDiffusion ModelsReinforcement LearningComplex Data Modeling
H
Huanghuang Deng
Zhejiang University
S
Shiying Ding
Zhejiang University
Shengyuan Zhang
Shengyuan Zhang
PhD candidate NTU Singapore
ultrasonicsmachine learningnumerical analysisnon-destructive testingLi-ion batteries
J
Jiayi Li
University of Nottingham Ningbo China
Lingyun Sun
Lingyun Sun
Zhejiang University
Design IntelligenceHCIArtificial IntelligenceIndustrial DesignAIGC