Structural Energy-Guided Sampling for View-Consistent Text-to-3D

📅 2025-08-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the “Janus problem” in text-to-3D generation—where frontal views appear plausible but multi-view geometry exhibits duplication or distortion—via a model-free, sampling-stage optimization. The core innovation is the first formulation of a structural energy function within the PCA subspace of intermediate U-Net features, coupled with gradient injection to guide the denoising trajectory and explicitly enforce multi-view geometric consistency. Integrated into SDS/VSD frameworks, the method dynamically regulates 3D structure during diffusion sampling without modifying model weights. Experiments demonstrate significant suppression of Janus artifacts, improved cross-view geometric alignment, and enhanced structural fidelity. By operating entirely at inference time and requiring no fine-tuning, it establishes a new paradigm for efficient, lightweight text-to-3D synthesis.

Technology Category

Application Category

📝 Abstract
Text-to-3D generation often suffers from the Janus problem, where objects look correct from the front but collapse into duplicated or distorted geometry from other angles. We attribute this failure to viewpoint bias in 2D diffusion priors, which propagates into 3D optimization. To address this, we propose Structural Energy-Guided Sampling (SEGS), a training-free, plug-and-play framework that enforces multi-view consistency entirely at sampling time. SEGS defines a structural energy in a PCA subspace of intermediate U-Net features and injects its gradients into the denoising trajectory, steering geometry toward the intended viewpoint while preserving appearance fidelity. Integrated seamlessly into SDS/VSD pipelines, SEGS significantly reduces Janus artifacts, achieving improved geometric alignment and viewpoint consistency without retraining or weight modification.
Problem

Research questions and friction points this paper is trying to address.

Addresses Janus problem in text-to-3D generation
Reduces viewpoint bias from 2D diffusion priors
Enforces multi-view consistency during sampling phase
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free plug-and-play framework
Structural energy-guided sampling method
PCA subspace gradient injection technique
🔎 Similar Papers
No similar papers found.
Q
Qing Zhang
The Australian National University
Jinguang Tong
Jinguang Tong
Australian National University
computer vision3d reconstruction
J
Jie Hong
The University of Hong Kong
J
Jing Zhang
The Australian National University
X
Xuesong Li
The Australian National University, CSIRO