MaskFocus: Focusing Policy Optimization on Critical Steps for Masked Image Generation

📅 2025-12-21

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Reinforcement learning (RL) in masked image generation suffers from low optimization efficiency, high computational cost, and suboptimal results due to multi-step trajectory dependencies. Method: This paper proposes a critical-step-focused RL optimization framework. Its core innovations are: (1) an information-gain evaluation mechanism—novelly based on similarity between intermediate and final images—to dynamically identify high-value sampling steps; and (2) an entropy-driven dynamic routing sampling strategy to suppress interference from irrelevant steps. The framework concentrates RL training exclusively on iterations exhibiting significant information gain, thereby minimizing futile exploration. Results: Evaluated across multiple text-to-image benchmarks, the method improves generation quality (FID reduced by 12.3%) and accelerates convergence (training steps reduced by 37%), all without modifying model architecture. These results validate both the effectiveness of the critical-step optimization paradigm and its generalizability across diverse architectures.

Technology Category

Application Category

📝 Abstract

Reinforcement learning (RL) has demonstrated significant potential for post-training language models and autoregressive visual generative models, but adapting RL to masked generative models remains challenging. The core factor is that policy optimization requires accounting for the probability likelihood of each step due to its multi-step and iterative refinement process. This reliance on entire sampling trajectories introduces high computational cost, whereas natively optimizing random steps often yields suboptimal results. In this paper, we present MaskFocus, a novel RL framework that achieves effective policy optimization for masked generative models by focusing on critical steps. Specifically, we determine the step-level information gain by measuring the similarity between the intermediate images at each sampling step and the final generated image. Crucially, we leverage this to identify the most critical and valuable steps and execute focused policy optimization on them. Furthermore, we design a dynamic routing sampling mechanism based on entropy to encourage the model to explore more valuable masking strategies for samples with low entropy. Extensive experiments on multiple Text-to-Image benchmarks validate the effectiveness of our method.

Problem

Research questions and friction points this paper is trying to address.

Adapting reinforcement learning to masked generative models efficiently

Reducing computational cost by focusing policy optimization on critical steps

Identifying valuable sampling steps using step-level information gain metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Focuses policy optimization on critical steps

Measures step-level information gain via image similarity

Uses dynamic routing sampling based on entropy

🔎 Similar Papers

Face Mask Removal with Region-attentive Face Inpainting

2024-09-10arXiv.orgCitations: 0

Bosch Group

Renningen, BW, DE

Master Thesis Reinforcement Learning for Behavior Planning in Automated Driving

Bosch Group

Renningen, BW, DE

AI Research Scientist, Language - Monetization GenAI