Correlation-Weighted Multi-Reward Optimization for Compositional Generation

📅 2026-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of incomplete semantic composition in existing text-to-image generation models when handling multi-concept prompts, where certain concepts are often omitted. To mitigate this issue, the authors propose an adaptive reward weighting mechanism grounded in inter-concept correlations. The approach first decomposes input prompts into three conceptual categories—objects, attributes, and relations—and employs specialized reward models to assess the satisfaction level of each concept. By analyzing the correlation structure among these concepts, the method dynamically estimates their relative optimization difficulty and reweights the corresponding reward signals, prioritizing reinforcement for conflicting or harder-to-satisfy concepts. This strategy effectively alleviates interference in multi-objective optimization, yielding significant improvements in generation consistency on SD3.5 and FLUX.1-dev, and achieving consistent performance gains across multiple multi-concept benchmarks, including ConceptMix, GenEval 2, and T2I-CompBench.

Technology Category

Application Category

📝 Abstract
Text-to-image models produce images that align well with natural language prompts, but compositional generation has long been a central challenge. Models often struggle to satisfy multiple concepts within a single prompt, frequently omitting some concepts and resulting in partial success. Such failures highlight the difficulty of jointly optimizing multiple concepts during reward optimization, where competing concepts can interfere with one another. To address this limitation, we propose Correlation-Weighted Multi-Reward Optimization (\ours), a framework that leverages the correlation structure among concept rewards to adaptively weight each attribute concept in optimization. By accounting for interactions among concepts, \ours balances competing reward signals and emphasizes concepts that are partially satisfied yet inconsistently generated across samples, improving compositional generation. Specifically, we decompose multi-concept prompts into pre-defined concept groups (\eg, objects, attributes, and relations) and obtain reward signals from dedicated reward models for each concept. We then adaptively reweight these rewards, assigning higher weights to conflicting or hard-to-satisfy concepts using correlation-based difficulty estimation. By focusing optimization on the most challenging concepts within each group, \ours encourages the model to consistently satisfy all requested attributes simultaneously. We apply our approach to train state-of-the-art diffusion models, SD3.5 and FLUX.1-dev, and demonstrate consistent improvements on challenging multi-concept benchmarks, including ConceptMix, GenEval 2, and T2I-CompBench.
Problem

Research questions and friction points this paper is trying to address.

compositional generation
multi-concept prompts
reward optimization
concept omission
text-to-image models
Innovation

Methods, ideas, or system contributions that make the work stand out.

compositional generation
multi-reward optimization
correlation-weighted
text-to-image diffusion models
concept consistency
🔎 Similar Papers
No similar papers found.