π€ AI Summary
This work addresses the limited generation quality of text-guided 3D diffusion models under *inference-time-only* settingsβi.e., without additional training. To this end, we propose a *noise optimization framework* operating solely at inference time. Our core method reframes 3D generation as a search problem over initial Gaussian noise, solved via a *validator-driven, gradient-free optimization algorithm*. To enhance stability, efficiency, and geometric diversity, we introduce three key components: (i) Gaussian normalization constraints on noise initialization, (ii) SVD-based dimensionality reduction to compress the implicit 3D representation space, and (iii) a dynamic singular-space reset mechanism to mitigate optimization stagnation. Extensive experiments across multiple text-to-3D benchmarks demonstrate consistent improvements in FID, Chamfer distance, and CLIP-Score, validating both the effectiveness and generalizability of our inference-time learnable noise optimization paradigm.
π Abstract
We explore inference-time scaling in text-guided 3D diffusion models to enhance generative quality without additional training. To this end, we introduce ITS3D, a framework that formulates the task as an optimization problem to identify the most effective Gaussian noise input. The framework is driven by a verifier-guided search algorithm, where the search algorithm iteratively refines noise candidates based on verifier feedback. To address the inherent challenges of 3D generation, we introduce three techniques for improved stability, efficiency, and exploration capability. 1) Gaussian normalization is applied to stabilize the search process. It corrects distribution shifts when noise candidates deviate from a standard Gaussian distribution during iterative updates. 2) The high-dimensional nature of the 3D search space increases computational complexity. To mitigate this, a singular value decomposition-based compression technique is employed to reduce dimensionality while preserving effective search directions. 3) To further prevent convergence to suboptimal local minima, a singular space reset mechanism dynamically updates the search space based on diversity measures. Extensive experiments demonstrate that ITS3D enhances text-to-3D generation quality, which shows the potential of computationally efficient search methods in generative processes. The source code is available at https://github.com/ZhenglinZhou/ITS3D.