What You Think Is What You Get: Bridge User Intent and Transfer Function Design through Multimodal Large Language Models

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Direct volume rendering suffers from a semantic gap between user intent and transfer function (TF) parameters, resulting in unintuitive design, an exponentially large parameter space, and poor generalizability. To address these challenges, this paper proposes the first intelligent TF optimization framework integrating multimodal large language models (MLLMs) with evolutionary strategies. Methodologically, we introduce an MLLM-driven rendering quality evaluator to replace subjective human assessment and design an evolutionary algorithm–based TF parameter explorer augmented with interactive optimization. Crucially, we incorporate the MLLM into the TF design closed loop, enabling end-to-end mapping from natural language descriptions to high-fidelity volume renderings. Extensive evaluation on multiple real-world volumetric datasets demonstrates that our framework significantly lowers the barrier to TF design, improves generalization across diverse data and tasks, and accelerates optimization convergence. Ablation studies confirm that each component—MLLM-based evaluation, evolutionary search, and interactive refinement—makes a statistically significant contribution to overall performance.

Technology Category

Application Category

📝 Abstract
Direct volume rendering (DVR) is a fundamental technique for visualizing volumetric data, with transfer functions (TFs) playing a crucial role in extracting meaningful structures. However, designing effective TFs remains unintuitive due to the semantic gap between user intent and TF parameter space. Researchers have developed numerous TF optimization methods to bridge this gap. However, existing methods still face two challenges: large exploration space and weak generalizability. To address these issues, we propose What You Think is What You Get (WYTWYG) framework, which leveraging Multi-model Large Language Models (MLLMs) to guide the TF optimization based on user intent. Specifically, we first introduce a novel TF optimization approach comprising two core components: (1) an evolution-based explorer for effective exploration of the TF space, and (2) a volume rendering quality evaluator based on MLLMs to provide generalizable visual guidance. We further propose a TF interactive design system based on this approach. We demonstrate the general applicability of our framework through three case studies, and validate the effectiveness of each component through extensive experiments. Our code is available at: https://github.com/wyysteelhead/TFevolve.
Problem

Research questions and friction points this paper is trying to address.

Bridging semantic gap between user intent and transfer function design
Reducing large exploration space in transfer function optimization
Improving generalizability of transfer function optimization methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses MLLMs to guide TF optimization
Integrates evolution-based TF space explorer
Employs MLLM-based rendering quality evaluator
🔎 Similar Papers
No similar papers found.
Yiyao Wang
Yiyao Wang
State Key Lab of CAD&CG, Zhejiang University
visualization
B
Bo Pan
State Key Lab of CAD&CG, Zhejiang University
K
Ke Wang
Zhejiang University of Technology
H
Han Liu
State Key Lab of CAD&CG, Zhejiang University
J
Jinyuan Mao
State Key Lab of CAD&CG, Zhejiang University
Y
Yuxin Liu
State Key Lab of CAD&CG, Zhejiang University
Minfeng Zhu
Minfeng Zhu
Zhejiang University
VisualisationMath
B
Bo Zhang
Zhejiang University
W
Weifeng Chen
Zhejiang University of Finance&Economics
Xiuqi Huang
Xiuqi Huang
Zhejiang University
Data Management
W
Wei Chen
State Key Lab of CAD&CG, Zhejiang University; Laboratory of Art and Archaeology Image (Zhejiang University), Ministry of Education, China