🤖 AI Summary
Human creativity assessment relies heavily on expert subjective scoring, which is costly, time-consuming, and lacks transparency and interpretability.
Method: This work proposes an automatic, interpretable assessment framework for creativity based on children’s drawings. We introduce a multimodal multitask learning architecture that— for the first time—decouples creativity into complementary content and style dimensions. A conditional learning mechanism enables dynamic feature extraction responsive to both semantic and stylistic cues. The model jointly trains a vision backbone with expert-annotated content category labels.
Results: Our method significantly outperforms existing regression-based approaches on creativity scoring. Attention visualizations generated by the model exhibit strong alignment with human expert judgments, demonstrating both high predictive accuracy and intrinsic interpretability. The framework thus offers a scalable, objective, and transparent alternative to conventional expert-based evaluation.
📝 Abstract
Assessing human creativity through visual outputs, such as drawings, plays a critical role in fields including psychology, education, and cognitive science. However, current assessment practices still rely heavily on expert-based subjective scoring, which is both labor-intensive and inherently subjective. In this paper, we propose a data-driven framework for automatic and interpretable creativity assessment from drawings. Motivated by the cognitive understanding that creativity can emerge from both what is drawn (content) and how it is drawn (style), we reinterpret the creativity score as a function of these two complementary dimensions.Specifically, we first augment an existing creativity labeled dataset with additional annotations targeting content categories. Based on the enriched dataset, we further propose a multi-modal, multi-task learning framework that simultaneously predicts creativity scores, categorizes content types, and extracts stylistic features. In particular, we introduce a conditional learning mechanism that enables the model to adapt its visual feature extraction by dynamically tuning it to creativity-relevant signals conditioned on the drawing's stylistic and semantic cues.Experimental results demonstrate that our model achieves state-of-the-art performance compared to existing regression-based approaches and offers interpretable visualizations that align well with human judgments. The code and annotations will be made publicly available at https://github.com/WonderOfU9/CSCA_PRCV_2025