Art3D: Training-Free 3D Generation from Flat-Colored Illustration

📅 2025-04-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing pre-trained 2D-to-3D models suffer from geometric distortions and implausible reconstructions when applied to flat, stylized color illustrations (e.g., hand-drawn flat art) due to the absence of intrinsic depth cues. To address this, we propose the first training-free, fine-tuning-free 2D-to-3D framework. Our method jointly leverages pre-trained 2D diffusion models (e.g., Stable Diffusion) to extract multi-scale structural and semantic features, while incorporating vision-language models (e.g., CLIP, ViT-L) for cross-modal realism assessment and iterative optimization—thereby significantly enhancing 3D-awareness from 2D inputs. Evaluated on our newly curated Flat-2D dataset—comprising over 100 diverse flat illustration styles—our approach outperforms prior methods in both geometric plausibility and style generalization. Crucially, it requires no training, is fully parameter-free, and robustly adapts to varied artistic styles. Code and dataset are publicly released.

Technology Category

Application Category

📝 Abstract
Large-scale pre-trained image-to-3D generative models have exhibited remarkable capabilities in diverse shape generations. However, most of them struggle to synthesize plausible 3D assets when the reference image is flat-colored like hand drawings due to the lack of 3D illusion, which are often the most user-friendly input modalities in art content creation. To this end, we propose Art3D, a training-free method that can lift flat-colored 2D designs into 3D. By leveraging structural and semantic features with pre- trained 2D image generation models and a VLM-based realism evaluation, Art3D successfully enhances the three-dimensional illusion in reference images, thus simplifying the process of generating 3D from 2D, and proves adaptable to a wide range of painting styles. To benchmark the generalization performance of existing image-to-3D models on flat-colored images without 3D feeling, we collect a new dataset, Flat-2D, with over 100 samples. Experimental results demonstrate the performance and robustness of Art3D, exhibiting superior generalizable capacity and promising practical applicability. Our source code and dataset will be publicly available on our project page: https://joy-jy11.github.io/ .
Problem

Research questions and friction points this paper is trying to address.

Generating 3D from flat-colored 2D illustrations
Overcoming lack of 3D illusion in hand drawings
Benchmarking models on flat-colored image dataset
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages pre-trained 2D image models
Uses VLM-based realism evaluation
Training-free 3D generation method
🔎 Similar Papers
No similar papers found.