🤖 AI Summary
Direct volume rendering suffers from a semantic gap between user intent and transfer function (TF) parameters, resulting in unintuitive design, an exponentially large parameter space, and poor generalizability. To address these challenges, this paper proposes the first intelligent TF optimization framework integrating multimodal large language models (MLLMs) with evolutionary strategies. Methodologically, we introduce an MLLM-driven rendering quality evaluator to replace subjective human assessment and design an evolutionary algorithm–based TF parameter explorer augmented with interactive optimization. Crucially, we incorporate the MLLM into the TF design closed loop, enabling end-to-end mapping from natural language descriptions to high-fidelity volume renderings. Extensive evaluation on multiple real-world volumetric datasets demonstrates that our framework significantly lowers the barrier to TF design, improves generalization across diverse data and tasks, and accelerates optimization convergence. Ablation studies confirm that each component—MLLM-based evaluation, evolutionary search, and interactive refinement—makes a statistically significant contribution to overall performance.
📝 Abstract
Direct volume rendering (DVR) is a fundamental technique for visualizing volumetric data, with transfer functions (TFs) playing a crucial role in extracting meaningful structures. However, designing effective TFs remains unintuitive due to the semantic gap between user intent and TF parameter space. Researchers have developed numerous TF optimization methods to bridge this gap. However, existing methods still face two challenges: large exploration space and weak generalizability. To address these issues, we propose What You Think is What You Get (WYTWYG) framework, which leveraging Multi-model Large Language Models (MLLMs) to guide the TF optimization based on user intent. Specifically, we first introduce a novel TF optimization approach comprising two core components: (1) an evolution-based explorer for effective exploration of the TF space, and (2) a volume rendering quality evaluator based on MLLMs to provide generalizable visual guidance. We further propose a TF interactive design system based on this approach. We demonstrate the general applicability of our framework through three case studies, and validate the effectiveness of each component through extensive experiments. Our code is available at: https://github.com/wyysteelhead/TFevolve.