MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Procedural material node graph synthesis relies heavily on expert knowledge, yet existing neural program synthesis approaches model node graphs solely as textual sequences, neglecting their essential visual-spatial structure—resulting in poor readability and editability. To address this, we propose the first multimodal program synthesis framework specifically designed for material node graphs: it jointly models visual inputs (node layout and connectivity) and textual inputs (semantic descriptions), incorporates a vision-spatially aware large multimodal model, and integrates a syntax-constrained tree search algorithm to guarantee structural validity and functional correctness of generated programs. We introduce the first high-quality, production-grade material node graph dataset. Experiments demonstrate that our method significantly outperforms text-only baselines on both unconditional and conditional generation tasks, achieving state-of-the-art performance in visual fidelity, functional correctness, and user interpretability.

Technology Category

Application Category

📝 Abstract
Material node graphs are programs that generate the 2D channels of procedural materials, including geometry such as roughness and displacement maps, and reflectance such as albedo and conductivity maps. They are essential in computer graphics for representing the appearance of virtual 3D objects parametrically and at arbitrary resolution. In particular, their directed acyclic graph structures and intermediate states provide an intuitive understanding and workflow for interactive appearance modeling. Creating such graphs is a challenging task and typically requires professional training. While recent neural program synthesis approaches attempt to simplify this process, they solely represent graphs as textual programs, failing to capture the inherently visual-spatial nature of node graphs that makes them accessible to humans. To address this gap, we present MultiMat, a multimodal program synthesis framework that leverages large multimodal models to process both visual and textual graph representations for improved generation of procedural material graphs. We train our models on a new dataset of production-quality procedural materials and combine them with a constrained tree search inference algorithm that ensures syntactic validity while efficiently navigating the program space. Our experimental results show that our multimodal program synthesis method is more efficient in both unconditional and conditional graph synthesis with higher visual quality and fidelity than text-only baselines, establishing new state-of-the-art performance.
Problem

Research questions and friction points this paper is trying to address.

Generating procedural material graphs from multimodal inputs
Overcoming limitations of text-only program synthesis methods
Creating valid material node graphs with visual quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses multimodal models for visual-textual graph synthesis
Trains on production-quality procedural materials dataset
Combines constrained tree search for valid program generation
🔎 Similar Papers
No similar papers found.