Training-Free Multi-View Extension of IC-Light for Textual Position-Aware Scene Relighting

📅 2025-11-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses text-driven, training-free 3D scene relighting—a challenging problem requiring photorealistic illumination editing without retraining. We propose GS-Light, the first multi-view relighting framework tailored for Gaussian Splatting (GS) scenes and supporting position-aware textual lighting instructions. Methodologically, we extend single-input diffusion models to training-free multi-view relighting: a large vision-language model parses lighting semantics (e.g., direction, color, intensity, reference objects), while geometric priors—including depth, surface normals, and semantic segmentation—are fused into geometry-constrained latent codes; multi-view consistency optimization further ensures artistic yet coherent rendering. Evaluated on diverse indoor and outdoor GS scenes, GS-Light outperforms prior methods in multi-view consistency, image fidelity, and semantic alignment. A user study confirms its superior visual quality and perceptual realism.

Technology Category

Application Category

📝 Abstract
We introduce GS-Light, an efficient, textual position-aware pipeline for text-guided relighting of 3D scenes represented via Gaussian Splatting (3DGS). GS-Light implements a training-free extension of a single-input diffusion model to handle multi-view inputs. Given a user prompt that may specify lighting direction, color, intensity, or reference objects, we employ a large vision-language model (LVLM) to parse the prompt into lighting priors. Using off-the-shelf estimators for geometry and semantics (depth, surface normals, and semantic segmentation), we fuse these lighting priors with view-geometry constraints to compute illumination maps and generate initial latent codes for each view. These meticulously derived init latents guide the diffusion model to generate relighting outputs that more accurately reflect user expectations, especially in terms of lighting direction. By feeding multi-view rendered images, along with the init latents, into our multi-view relighting model, we produce high-fidelity, artistically relit images. Finally, we fine-tune the 3DGS scene with the relit appearance to obtain a fully relit 3D scene. We evaluate GS-Light on both indoor and outdoor scenes, comparing it to state-of-the-art baselines including per-view relighting, video relighting, and scene editing methods. Using quantitative metrics (multi-view consistency, imaging quality, aesthetic score, semantic similarity, etc.) and qualitative assessment (user studies), GS-Light demonstrates consistent improvements over baselines. Code and assets will be made available upon publication.
Problem

Research questions and friction points this paper is trying to address.

Achieving text-guided 3D scene relighting without training requirements
Extending single-view diffusion models to handle multi-view inputs effectively
Generating lighting-consistent relit 3D scenes from textual descriptions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free multi-view extension of diffusion model
LVLM parses prompts into lighting priors
Fuses lighting priors with view-geometry constraints
🔎 Similar Papers
No similar papers found.
J
Jiangnan Ye
J
Jiedong Zhuang
L
Lianrui Mu
W
Wenjie Zheng
Jiaqi Hu
Jiaqi Hu
Rice University; Genentech
Artificial IntelligenceDeep Learning
X
Xingze Zou
J
Jing Wang
Haoji Hu
Haoji Hu
Zhejiang Univeristy, China
Machine LearningComputer VisionDeep Learning