APEX: Learning Adaptive Priorities for Multi-Objective Alignment in Vision-Language Generation

📅 2026-01-10
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the optimization imbalance in multi-objective alignment caused by static linear scalarization, where fixed weights often lead models to overfit high-variance objectives—such as OCR—at the expense of perceptual goals. To mitigate this, the authors propose the APEX framework, which identifies two key imbalance mechanisms: “variance hijacking” and “gradient conflict.” APEX introduces a two-stage adaptive normalization scheme to stabilize heterogeneous reward signals and incorporates a P³ adaptive prioritization scheduler that dynamically balances learning potential, conflict penalties, and progress demands. Evaluated through fine-tuning Stable Diffusion 3.5 across four heterogeneous objectives, APEX achieves superior Pareto trade-offs: +1.31 in PickScore, +0.35 in DeQA, +0.53 in aesthetic score, while maintaining stable OCR accuracy.

Technology Category

Application Category

📝 Abstract
Multi-objective alignment for text-to-image generation is commonly implemented via static linear scalarization, but fixed weights often fail under heterogeneous rewards, leading to optimization imbalance where models overfit high-variance, high-responsiveness objectives (e.g., OCR) while under-optimizing perceptual goals. We identify two mechanistic causes: variance hijacking, where reward dispersion induces implicit reweighting that dominates the normalized training signal, and gradient conflicts, where competing objectives produce opposing update directions and trigger seesaw-like oscillations. We propose APEX (Adaptive Priority-based Efficient X-objective Alignment), which stabilizes heterogeneous rewards with Dual-Stage Adaptive Normalization and dynamically schedules objectives via P^3 Adaptive Priorities that combine learning potential, conflict penalty, and progress need. On Stable Diffusion 3.5, APEX achieves improved Pareto trade-offs across four heterogeneous objectives, with balanced gains of +1.31 PickScore, +0.35 DeQA, and +0.53 Aesthetics while maintaining competitive OCR accuracy, mitigating the instability of multi-objective alignment.
Problem

Research questions and friction points this paper is trying to address.

multi-objective alignment
heterogeneous rewards
optimization imbalance
variance hijacking
gradient conflicts
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-objective alignment
adaptive prioritization
variance hijacking
gradient conflict
vision-language generation
🔎 Similar Papers
No similar papers found.
D
Dongliang Chen
East China Normal University
X
Xinlin Zhuang
East China Normal University
Junjie Xu
Junjie Xu
East China Normal University
AIGCAudio-based GenerationHuman-in-the-loop
L
Luojian Xie
East China Normal University
Z
Zehui Wang
East China Normal University
J
Jiaxi Zhuang
East China Normal University
Haolin Yang
Haolin Yang
University of Chicago
large language modelsnatural language processing
L
Liang Dou
East China Normal University
Xiao He
Xiao He
Professor, School of Chemistry and Molecular Engineering, East China Normal University
Theoretical and Computational Chemistry
Xingjiao Wu
Xingjiao Wu
East China Normal University
Computer VisionCrowd CountingDocument Layout AnalysisHuman-in-the-loop
Y
Ying Qian
East China Normal University