π€ AI Summary
Visual AI training demands vast, diverse, and fault-tolerant synthetic material dataβyet existing material repositories (designed for graphics applications) are small-scale, overly precise, and lack diversity. Method: We propose the first open-source, large-scale material library for AI training, comprising 500K 2D textures and corresponding SVBRDF/PBR parameters extracted from real-world images. To overcome reliance on manual annotation or physical modeling, we introduce an unsupervised two-stage framework: (1) grid-based texture cropping via statistical distribution alignment, and (2) correlation-driven stochastic mapping for PBR attribute generation. Contribution/Results: This is the first method to enable emergent, high-fidelity, and highly diverse material representation learning directly from natural images. Experiments demonstrate that visual models trained on our library achieve significantly superior generalization performance compared to baselines trained on hand-crafted, curated materials.
π Abstract
Vastextures is a vast repository of 500,000 textures and PBR materials extracted from real-world images using an unsupervised process. The extracted materials and textures are extremely diverse and cover a vast range of real-world patterns, but at the same time less refined compared to existing repositories. The repository is composed of 2D textures cropped from natural images and SVBRDF/PBR materials generated from these textures. Textures and PBR materials are essential for CGI. Existing materials repositories focus on games, animation, and arts, that demand a limited amount of high-quality assets. However, virtual worlds and synthetic data are becoming increasingly important for training A.I systems for computer vision. This application demands a huge amount of diverse assets but at the same time less affected by noisy and unrefined assets. Vastexture aims to address this need by creating a free, huge, and diverse assets repository that covers as many real-world materials as possible. The materials are automatically extracted from natural images in two steps: 1) Automatically scanning a giant amount of images to identify and crop regions with uniform textures. This is done by splitting the image into a grid of cells and identifying regions in which all of the cells share a similar statistical distribution. 2) Extracting the properties of the PBR material from the cropped texture. This is done by randomly guessing every correlation between the properties of the texture image and the properties of the PBR material. The resulting PBR materials exhibit a vast amount of real-world patterns as well as unexpected emergent properties. Neutral nets trained on this repository outperformed nets trained using handcrafted assets.