🤖 AI Summary
Scene sketch generation requires joint semantic understanding and regional structural modeling, yet existing methods struggle to harmonize progressive integration of heterogeneous regions (e.g., foreground/background) while preserving global compositional consistency. This paper proposes a multi-round regional co-optimization framework: first, semantic segmentation partitions the scene into foreground and background; then, Bezier curve sets are optimized iteratively per region to enable progressive fusion. A novel stroke initialization strategy ensures geometric integrity and optimization convergence. Additionally, a hybrid loss—combining CLIP-based semantic alignment and VGG-based perceptual feature reconstruction—jointly enforces semantic fidelity and structural detail. Experiments demonstrate that our method significantly outperforms state-of-the-art approaches in both sketch quality (FID, LPIPS) and structural integrity metrics, yielding sketches with superior semantic accuracy and compositional plausibility aligned with human perception.
📝 Abstract
Scene sketching is to convert a scene into a simplified, abstract representation that captures the essential elements and composition of the original scene. It requires a semantic understanding of the scene and consideration of different regions within the scene. Since scenes often contain diverse visual information across various regions, such as foreground objects, background elements, and spatial divisions, dealing with these different regions poses unique difficulties. In this paper, we define a sketch as some sets of B'ezier curves because of their smooth and versatile characteristics. We optimize different regions of input scene in multiple rounds. In each optimization round, the strokes sampled from the next region can seamlessly be integrated into the sketch generated in the previous optimization round. We propose an additional stroke initialization method to ensure the integrity of the scene and the convergence of optimization. A novel CLIP-based Semantic Loss and a VGG-based Feature Loss are utilized to guide our multi-round optimization. Extensive experimental results on the quality and quantity of the generated sketches confirm the effectiveness of our method.