🤖 AI Summary
Existing 3D texture generation methods rely on iterative calls to 2D text-to-image models, lack explicit geometric awareness, and fail to align with human subjective preferences. To address these limitations, we propose the first end-to-end differentiable preference learning framework for 3D texture synthesis, embedding geometry-aware reward functions directly into the generative pipeline. Specifically, we design four differentiable rewards—curvature consistency, normal alignment, occlusion robustness, and view invariance—and integrate them with differentiable rendering and 2D diffusion models to enable gradient-driven optimization. Evaluated on natural language–driven 3D texture generation, our method significantly improves texture fidelity, geometric compatibility, and alignment with human preferences. It achieves high photorealism, precise controllability, and interpretable reward-driven optimization, establishing a novel paradigm for preference-based 3D content generation.
📝 Abstract
Recent advances in 3D generative models have achieved impressive results but 3D contents generated by these models may not align with subjective human preferences or task-specific criteria. Moreover, a core challenge in the 3D texture generation domain remains: most existing approaches rely on repeated calls to 2D text-to-image generative models, which lack an inherent understanding of the 3D structure of the input 3D mesh object. To address this, we propose an end-to-end differentiable preference learning framework that back-propagates human preferences, represented by differentiable reward functions, through the entire 3D generative pipeline, making the process inherently geometry-aware. We demonstrate the effectiveness of our framework using four proposed novel geometry-aware reward functions, offering a more controllable and interpretable pathway for high-quality 3D content creation from natural language.