🤖 AI Summary
Traditional Score Distillation Sampling (SDS) for 3D generation often suffers from texture oversaturation and geometric distortion; while negative prompting mitigates these issues, it introduces an inherent trade-off between texture enhancement and shape fidelity. This paper proposes Target-aware Multi-Objective Score distillation (T-MOS), the first framework to characterize the coupled influence of target-embedding-based negative prompts on both texture and geometry. T-MOS introduces an adaptive weighting mechanism that dynamically balances texture realism and geometric accuracy during optimization. Built upon pretrained 2D text-to-image diffusion models, it requires no auxiliary networks or explicit supervision. Extensive experiments demonstrate that T-MOS consistently outperforms state-of-the-art methods across multiple benchmarks, generating 3D assets with both high-fidelity textures and precise geometric structures.
📝 Abstract
Score Distillation Sampling (SDS) enables 3D asset generation by distilling priors from pretrained 2D text-to-image diffusion models, but vanilla SDS suffers from over-saturation and over-smoothing. To mitigate this issue, recent variants have incorporated negative prompts. However, these methods face a critical trade-off: limited texture optimization, or significant texture gains with shape distortion. In this work, we first conduct a systematic analysis and reveal that this trade-off is fundamentally governed by the utilization of the negative prompts, where Target Negative Prompts (TNP) that embed target information in the negative prompts dramatically enhancing texture realism and fidelity but inducing shape distortions. Informed by this key insight, we introduce the Target-Balanced Score Distillation (TBSD). It formulates generation as a multi-objective optimization problem and introduces an adaptive strategy that effectively resolves the aforementioned trade-off. Extensive experiments demonstrate that TBSD significantly outperforms existing state-of-the-art methods, yielding 3D assets with high-fidelity textures and geometrically accurate shape.