VVGT: Visual Volume-Grounded Transformer

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses the longstanding trade-off among high resolution, real-time interactivity, and scalability in traditional volume visualization methods, as well as the limited generalization of existing 3D Gaussian splatting techniques that rely on per-scene optimization. The paper introduces the first feed-forward volumetric Gaussian splatting framework, which directly maps multi-view volume data into surface-free 3D Gaussian representations through a dual-Transformer architecture augmented with a Volume Geometry Forcing mechanism. By incorporating epipolar-geometry-based cross-attention for efficient feature fusion, the method eliminates the need for per-scene optimization and enables zero-shot generalization. It achieves high-quality volume rendering across multiple datasets, offers orders-of-magnitude faster conversion speeds, and significantly improves geometric consistency, interactivity, and scalability.

Technology Category

Application Category

📝 Abstract

Volumetric visualization has long been dominated by Direct Volume Rendering (DVR), which operates on dense voxel grids and suffers from limited scalability as resolution and interactivity demands increase. Recent advances in 3D Gaussian Splatting (3DGS) offer a representation-centric alternative; however, existing volumetric extensions still depend on costly per-scene optimization, limiting scalability and interactivity. We present VVGT (Visual Volume-Grounded Transformer), a feed-forward, representation-first framework that directly maps volumetric data to a 3D Gaussian Splatting representation, advancing a new paradigm for volumetric visualization beyond DVR. Unlike prior feed-forward 3DGS methods designed for surface-centric reconstruction, VVGT explicitly accounts for volumetric rendering, where each pixel aggregates contributions along a ray. VVGT employs a dual-transformer network and introduces Volume Geometry Forcing, an epipolar cross-attention mechanism that integrates multi-view observations into distributed 3D Gaussian primitives without surface assumptions. This design eliminates per-scene optimization while enabling accurate volumetric representations. Extensive experiments show that VVGT achieves high-quality visualization with orders-of-magnitude faster conversion, improved geometric consistency, and strong zero-shot generalization across diverse datasets, enabling truly interactive and scalable volumetric visualization. The code will be publicly released upon acceptance.

Problem

Research questions and friction points this paper is trying to address.

volumetric visualization

scalability

interactivity

Direct Volume Rendering

3D Gaussian Splatting

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting

Volume Rendering

Transformer