POCA: Pareto-Optimal Curriculum Alignment for Visual Text Generation

📅 2026-04-27

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This work addresses key challenges in vision-to-text generation, including the trade-off between textual accuracy and global image coherence, difficulty in balancing multiple rewards in reinforcement learning, training instability, and inefficient prompt selection. To tackle these issues, the paper introduces a novel multi-objective optimization framework that uniquely integrates Pareto optimality with adaptive curriculum learning. By identifying the Pareto-optimal set of solutions, the approach mitigates reward conflicts, while an automatically assessed difficulty-based curriculum strategy enables efficient alignment training from easy to hard samples within a unified reward space. Experimental results demonstrate that the proposed method significantly outperforms existing approaches across multiple metrics—including CLIP score, HPS score, and sentence accuracy—while enhancing both training stability and generation quality.

Technology Category

Application Category

📝 Abstract

Current visual text generation models struggle with the trade-off between text accuracy and overall image coherence. We find that achieving high text accuracy can reduce aesthetic quality and instruction-following capability. Although reinforcement learning approaches can alleviate the problem through aligning with multiple rewards, they are often unstable for text generation, as existing approaches normally optimize multiple rewards in a weighted-sum way. In addition, it is difficult to balance the weight of each reward. Moreover, reinforcement learning requires a set of training instructions. A large number of prompts require more training time and computing resources, while a small set leads to poor performance. Hence, how to select the prompts for efficient training is an unsolved problem. In this study, we propose Pareto-Optimal Curriculum Alignment (POCA), a framework that addresses this issue as a multi-objective problem by: 1) identifying the Pareto-optimal set to avoid simple scalarization and 2) designing an adaptive curriculum alignment strategy to manage a learning sequence of a multi-reward dataset using automatic difficulty assessment, which is crucial for optimal convergence as RL methods explore in a limited data environment. In synergy, POCA finds the Pareto-optimal set in a unified reward space, which eliminates inconsistent signals to find the best trade-off solution from different rewards under an easy-to-hard optimization landscape. The experimental results show that POCA significantly improves all metrics such as CLIP, HPS scores and sentence accuracy.

Problem

Research questions and friction points this paper is trying to address.

visual text generation

multi-objective optimization

reinforcement learning

curriculum learning

Pareto optimality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pareto-Optimal

Curriculum Learning

Multi-Objective Optimization