🤖 AI Summary
This work addresses the semantic gap between visual textures and their underlying generative mechanisms, advancing AI from superficial perception (“captioning images”) toward causal understanding (“inferring causes from appearances”).
Method: We introduce SciTextures—the first interdisciplinary dataset comprising over 12,000 scientific, technical, and artistic generative models and 100,000 corresponding texture images—explicitly linking visual patterns to executable generation code. We propose a proxy-based AI pipeline for automated model crawling, code parsing, standardized packaging, and image simulation. Additionally, we design a “reverse-engineer → execute → compare” evaluation paradigm that takes natural texture images as input to assess vision-language models’ (VLMs) ability to model and reproduce physical generation logic.
Contribution/Results: Experiments demonstrate that current VLMs can partially identify complex generative mechanisms, successfully infer underlying models, and produce high-fidelity simulated textures—establishing the first quantifiable benchmark for causal visual understanding.
📝 Abstract
The ability to connect visual patterns with the processes that form them represents one of the deepest forms of visual understanding. Textures of clouds and waves, the growth of cities and forests, or the formation of materials and landscapes are all examples of patterns emerging from underlying mechanisms. We present the Scitextures dataset, a large-scale collection of textures and visual patterns from all domains of science, tech, and art, along with the models and code that generate these images. Covering over 1,200 different models and 100,000 images of patterns and textures from physics, chemistry, biology, sociology, technology, mathematics, and art, this dataset offers a way to explore the connection between the visual patterns that shape our world and the mechanisms that produce them. Created by an agentic AI pipeline that autonomously collects and implements models in standardized form, we use SciTextures to evaluate the ability of leading AI models to link visual patterns to the models and code that generate them, and to identify different patterns that emerged from the same process. We also test AIs ability to infer and recreate the mechanisms behind visual patterns by providing a natural image of a real-world pattern and asking the AI to identify, model, and code the mechanism that formed the pattern, then run this code to generate a simulated image that is compared to the real image. These benchmarks show that vision-language models (VLMs) can understand and simulate the physical system beyond a visual pattern. The dataset and code are available at: https://zenodo.org/records/17485502