GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

📅 2026-05-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

203K/year
🤖 AI Summary
This work addresses the challenge of open-domain image generation, where models must handle diverse and complex user requests yet struggle to generalize effectively or evolve autonomously. To this end, we propose GenEvolve, a novel framework that formulates the generation process as a trajectory of tool orchestration. By comparing multiple trajectories for the same request, GenEvolve extracts structured visual experience and employs a privileged teacher branch to guide dense token-level self-distillation in a student model. Our approach introduces the first unsupervised visual experience distillation mechanism based on trajectory contrast, integrated with a procedural prompt-reference construction paradigm that jointly optimizes reference selection and prompt formulation. Evaluated on established benchmarks and our newly introduced GenEvolve-Bench, GenEvolve substantially outperforms strong baselines, achieving state-of-the-art performance.
📝 Abstract
Open-ended image generation is no longer a simple prompt-to-image problem. High-quality generation often requires an agent to combine a model's internal generative ability with external resources. As requests become more diverse and demanding, we aim to develop a general image-generation agent that can self-evolve through trajectories and use tools more effectively across varied generation challenges. To this end, we propose GenEvolve, a self-evolving framework based on Tool-Orchestrated Visual Experience Distillation. In GenEvolve, each generation attempt is modeled as a tool-orchestrated trajectory, where the agent gathers evidence, selects references, invokes generation skills, and composes them into a prompt-reference program. Unlike existing agentic generation methods that mainly rely on image-level scalar rewards, GenEvolve compares multiple trajectories for the same request and abstracts best-worst differences into structured visual experience, provided only to a privileged teacher branch. Inspired by on-policy self-distillation, Visual Experience Distillation provides dense token-level supervision, helping the student internalize better search, knowledge activation, reference selection, and prompt construction. We further construct GenEvolve-Data and GenEvolve-Bench. Experiments on public benchmarks and GenEvolve-Bench show substantial gains over strong baselines, achieving state-of-the-art performance among current image-generation frameworks. Our website is as follows: https://ephemeral182.github.io/GenEvolve/
Problem

Research questions and friction points this paper is trying to address.

self-evolving
image generation
tool orchestration
visual experience
agent
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Evolving Agents
Visual Experience Distillation
Tool-Orchestrated Trajectory
Token-Level Supervision
Image Generation
🔎 Similar Papers
No similar papers found.