🤖 AI Summary
This work addresses the computational modeling of human pareidolia—the perceptual phenomenon wherein observers perceive meaningful patterns (e.g., animal shapes) in ambiguous natural contours (e.g., clouds, rocks, flames). We propose an end-to-end framework: (1) open-vocabulary segmentation to localize salient contour regions; (2) vision-language models (VLMs) to infer plausible animal semantics from these regions; and (3) text-to-image diffusion models guided by both semantic prompts and contour masks to synthesize semantically coherent, visually aligned animal forms, seamlessly integrated into the original scene. Our key contribution is the first formulation of pareidolia as a trainable cross-modal generative task—requiring no manual annotations or predefined categories. Experiments demonstrate robustness and creativity in diverse real-world scenes: generated outputs exhibit spatial plausibility, semantic fidelity, and environmental consistency. The approach holds promise for digital art creation, visual storytelling, and interactive media applications.
📝 Abstract
Humans possess a unique ability to perceive meaningful patterns in ambiguous stimuli, a cognitive phenomenon known as pareidolia. This paper introduces Shape2Animal framework to mimics this imaginative capacity by reinterpreting natural object silhouettes, such as clouds, stones, or flames, as plausible animal forms. Our automated framework first performs open-vocabulary segmentation to extract object silhouette and interprets semantically appropriate animal concepts using vision-language models. It then synthesizes an animal image that conforms to the input shape, leveraging text-to-image diffusion model and seamlessly blends it into the original scene to generate visually coherent and spatially consistent compositions. We evaluated Shape2Animal on a diverse set of real-world inputs, demonstrating its robustness and creative potential. Our Shape2Animal can offer new opportunities for visual storytelling, educational content, digital art, and interactive media design. Our project page is here: https://shape2image.github.io