🤖 AI Summary
Existing 3D generative models prioritize aesthetic quality while neglecting physical stability—particularly self-support. This paper introduces the Direct Simulation Optimization (DSO) framework, the first to embed non-differentiable physics simulation feedback directly into an end-to-end generative pipeline, enabling joint optimization of the generator and physical constraints. Its core innovation is the Direct Reward Optimization (DRO) objective: without paired preferences or ground-truth 3D annotations, DRO drives unsupervised self-improvement via a closed-loop process—self-sampling, physics simulation (using PyBullet), and gradient approximation. By unifying diffusion-based generation, Direct Preference Optimization (DPO), and DRO, our method achieves substantial gains in efficiency (over 100× speedup during test-time optimization) and physical plausibility. Critically, it generates highly stable 3D objects under zero-shot, annotation-free conditions.
📝 Abstract
Most 3D object generators focus on aesthetic quality, often neglecting physical constraints necessary in applications. One such constraint is that the 3D object should be self-supporting, i.e., remains balanced under gravity. Prior approaches to generating stable 3D objects used differentiable physics simulators to optimize geometry at test-time, which is slow, unstable, and prone to local optima. Inspired by the literature on aligning generative models to external feedback, we propose Direct Simulation Optimization (DSO), a framework to use the feedback from a (non-differentiable) simulator to increase the likelihood that the 3D generator outputs stable 3D objects directly. We construct a dataset of 3D objects labeled with a stability score obtained from the physics simulator. We can then fine-tune the 3D generator using the stability score as the alignment metric, via direct preference optimization (DPO) or direct reward optimization (DRO), a novel objective, which we introduce, to align diffusion models without requiring pairwise preferences. Our experiments show that the fine-tuned feed-forward generator, using either DPO or DRO objective, is much faster and more likely to produce stable objects than test-time optimization. Notably, the DSO framework works even without any ground-truth 3D objects for training, allowing the 3D generator to self-improve by automatically collecting simulation feedback on its own outputs.