🤖 AI Summary
To address limited diversity in conditional image generation caused by insufficient modeling of input uncertainty, this paper proposes Rainbow: the first framework to integrate Generative Flow Networks (GFlowNets) into conditional generation. Rainbow constructs a parameterized latent graph to encode multimodal, implicit semantic representations of the condition and samples diverse trajectories in the latent space, thereby guiding pre-trained generative models to produce rich, semantically coherent, and interpretable images. Crucially, it disentangles uncertainty directly from a single condition—without relying on random seed perturbation or prompt engineering. Experiments on natural and medical imaging datasets demonstrate that Rainbow significantly improves diversity (+32.7% LPIPS) and fidelity (+18.4% FID) across image synthesis, generation, and counterfactual reasoning tasks. This work establishes a novel paradigm for uncertainty-aware, controllable generation.
📝 Abstract
Capturing diversity is crucial in conditional and prompt-based image generation, particularly when conditions contain uncertainty that can lead to multiple plausible outputs. To generate diverse images reflecting this diversity, traditional methods often modify random seeds, making it difficult to discern meaningful differences between samples, or diversify the input prompt, which is limited in verbally interpretable diversity. We propose Rainbow, a novel conditional image generation framework, applicable to any pretrained conditional generative model, that addresses inherent condition/prompt uncertainty and generates diverse plausible images. Rainbow is based on a simple yet effective idea: decomposing the input condition into diverse latent representations, each capturing an aspect of the uncertainty and generating a distinct image. First, we integrate a latent graph, parameterized by Generative Flow Networks (GFlowNets), into the prompt representation computation. Second, leveraging GFlowNets' advanced graph sampling capabilities to capture uncertainty and output diverse trajectories over the graph, we produce multiple trajectories that collectively represent the input condition, leading to diverse condition representations and corresponding output images. Evaluations on natural image and medical image datasets demonstrate Rainbow's improvement in both diversity and fidelity across image synthesis, image generation, and counterfactual generation tasks.