Gen-Searcher: Reinforcing Agentic Search for Image Generation

๐Ÿ“… 2026-03-30
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limitation of existing image generation models, which rely on static internal knowledge and struggle to incorporate external or up-to-date information required in real-world scenarios. The authors propose the first trainable search-augmented image generation agent that leverages multi-hop retrieval to acquire both textual knowledge and reference images, enabling knowledge-grounded image synthesis. To support this approach, they introduce dedicated datasetsโ€”Gen-Searcher-SFT-10k and Gen-Searcher-RL-6kโ€”and a new evaluation benchmark, KnowGen. They further design a dual-modality reward mechanism for reinforcement learning, combining supervised fine-tuning with the GRPO algorithm. Experimental results demonstrate substantial improvements, with performance gains of approximately 16 points on KnowGen and 15 points on the WISE benchmark, significantly outperforming strong baselines such as Qwen-Image.
๐Ÿ“ Abstract
Recent image generation models have shown strong capabilities in generating high-fidelity and photorealistic images. However, they are fundamentally constrained by frozen internal knowledge, thus often failing on real-world scenarios that are knowledge-intensive or require up-to-date information. In this paper, we present Gen-Searcher, as the first attempt to train a search-augmented image generation agent, which performs multi-hop reasoning and search to collect the textual knowledge and reference images needed for grounded generation. To achieve this, we construct a tailored data pipeline and curate two high-quality datasets, Gen-Searcher-SFT-10k and Gen-Searcher-RL-6k, containing diverse search-intensive prompts and corresponding ground-truth synthesis images. We further introduce KnowGen, a comprehensive benchmark that explicitly requires search-grounded external knowledge for image generation and evaluates models from multiple dimensions. Based on these resources, we train Gen-Searcher with SFT followed by agentic reinforcement learning with dual reward feedback, which combines text-based and image-based rewards to provide more stable and informative learning signals for GRPO training. Experiments show that Gen-Searcher brings substantial gains, improving Qwen-Image by around 16 points on KnowGen and 15 points on WISE. We hope this work can serve as an open foundation for search agents in image generation, and we fully open-source our data, models, and code.
Problem

Research questions and friction points this paper is trying to address.

image generation
knowledge-intensive
search-augmented
grounded generation
external knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

search-augmented generation
agentic reinforcement learning
multi-hop reasoning
grounded image generation
dual reward feedback
๐Ÿ”Ž Similar Papers