🤖 AI Summary
Existing neural global illumination (GI) methods typically rely on single-scene optimization or operate solely in 2D screen space, resulting in poor generalization, viewpoint inconsistency, and limited spatial understanding. This work introduces the first 3D light transport embedding framework designed for cross-scene generalization: given geometrically and materially annotated point clouds as input, it employs a scalable Transformer to model inter-point interactions and integrates nearest-neighbor retrieval with cross-attention to aggregate implicit neural primitives—enabling direct learning of rendering-free light transport representations in 3D space. The framework supports spatial-directional radiance field estimation, rapid multi-task transfer, and improved unbiased path guiding efficiency. Experiments demonstrate high-fidelity diffuse GI prediction across diverse indoor and outdoor scenes; it achieves effective adaptation to novel scenes with minimal fine-tuning and shows preliminary efficacy in accelerating glossy material rendering and path guiding.
📝 Abstract
Global illumination (GI) is essential for realistic rendering but remains computationally expensive due to the complexity of simulating indirect light transport. Recent neural methods have mainly relied on per-scene optimization, sometimes extended to handle changes in camera or geometry. Efforts toward cross-scene generalization have largely stayed in 2D screen space, such as neural denoising or G-buffer based GI prediction, which often suffer from view inconsistency and limited spatial understanding. We propose a generalizable 3D light transport embedding that approximates global illumination directly from 3D scene configurations, without using rasterized or path-traced cues. Each scene is represented as a point cloud with geometric and material features. A scalable transformer models global point-to-point interactions to encode these features into neural primitives. At render time, each query point retrieves nearby primitives via nearest-neighbor search and aggregates their latent features through cross-attention to predict the desired rendering quantity. We demonstrate results on diffuse global illumination prediction across diverse indoor scenes with varying layouts, geometry, and materials. The embedding trained for irradiance estimation can be quickly adapted to new rendering tasks with limited fine-tuning. We also present preliminary results for spatial-directional radiance field estimation for glossy materials and show how the normalized field can accelerate unbiased path guiding. This approach highlights a path toward integrating learned priors into rendering pipelines without explicit ray-traced illumination cues.