🤖 AI Summary
Semantic region parsing of child-style sketch drawings is challenging, and style distortion frequently occurs during animation generation. Method: We propose the first semantic hierarchical segmentation model tailored for children’s drawings. Built upon the SAM architecture, our approach introduces a hierarchical fine-tuning framework that integrates semantic-prior-guided region parsing with cross-domain generalization strategies. We also construct the first large-scale children’s drawing dataset—comprising 16,000 images with pixel-level annotations across 25 semantic classes. Contribution/Results: Our model achieves significant improvements over state-of-the-art methods on children’s drawing segmentation. It enables fully automatic facial animation, relighting, and animation enhancement while preserving stylistic fidelity. Notably, it generalizes effectively to out-of-domain hand-drawn human figures. This work establishes a novel paradigm for intelligent, style-consistent animation generation from children’s sketches.
📝 Abstract
Childlike human figure drawings represent one of humanity's most accessible forms of character expression, yet automatically analyzing their contents remains a significant challenge. While semantic segmentation of realistic humans has recently advanced considerably, existing models often fail when confronted with the abstract, representational nature of childlike drawings. This semantic understanding is a crucial prerequisite for animation tools that seek to modify figures while preserving their unique style. To help achieve this, we propose a novel hierarchical segmentation model, built upon the architecture and pre-trained SAM, to quickly and accurately obtain these semantic labels. Our model achieves higher accuracy than state-of-the-art segmentation models focused on realistic humans and cartoon figures, even after fine-tuning. We demonstrate the value of our model for semantic segmentation through multiple applications: a fully automatic facial animation pipeline, a figure relighting pipeline, improvements to an existing childlike human figure drawing animation method, and generalization to out-of-domain figures. Finally, to support future work in this area, we introduce a dataset of 16,000 childlike drawings with pixel-level annotations across 25 semantic categories. Our work can enable entirely new, easily accessible tools for hand-drawn character animation, and our dataset can enable new lines of inquiry in a variety of graphics and human-centric research fields.