🤖 AI Summary
Existing image generation and editing approaches often treat these tasks in isolation, struggling to simultaneously ensure spatial consistency and semantic coherence while lacking structured control over object relationships and scene layout. This work proposes SimGraph, the first unified framework based on scene graphs that integrates token-based generation and diffusion-based editing within a single model. By leveraging structured scene graphs, SimGraph enables precise control over object interactions, spatial arrangements, and layout configurations. Notably, it achieves the first end-to-end unification of generation and editing, significantly enhancing both semantic accuracy and spatial consistency in the generated outputs. Extensive experiments demonstrate that SimGraph outperforms state-of-the-art methods across multiple evaluation metrics.
📝 Abstract
Recent advancements in Generative Artificial Intelligence (GenAI) have significantly enhanced the capabilities of both image generation and editing. However, current approaches often treat these tasks separately, leading to inefficiencies and challenges in maintaining spatial consistency and semantic coherence between generated content and edits. Moreover, a major obstacle is the lack of structured control over object relationships and spatial arrangements. Scene graph-based methods, which represent objects and their interrelationships in a structured format, offer a solution by providing greater control over composition and interactions in both image generation and editing. To address this, we introduce SimGraph, a unified framework that integrates scene graph-based image generation and editing, enabling precise control over object interactions, layouts, and spatial coherence. In particular, our framework integrates token-based generation and diffusion-based editing within a single scene graph-driven model, ensuring high-quality and consistent results. Through extensive experiments, we empirically demonstrate that our approach outperforms existing state-of-the-art methods.