DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness

📅 2025-03-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D generative models prioritize aesthetic quality while neglecting physical stability—particularly self-support. This paper introduces the Direct Simulation Optimization (DSO) framework, the first to embed non-differentiable physics simulation feedback directly into an end-to-end generative pipeline, enabling joint optimization of the generator and physical constraints. Its core innovation is the Direct Reward Optimization (DRO) objective: without paired preferences or ground-truth 3D annotations, DRO drives unsupervised self-improvement via a closed-loop process—self-sampling, physics simulation (using PyBullet), and gradient approximation. By unifying diffusion-based generation, Direct Preference Optimization (DPO), and DRO, our method achieves substantial gains in efficiency (over 100× speedup during test-time optimization) and physical plausibility. Critically, it generates highly stable 3D objects under zero-shot, annotation-free conditions.

Technology Category

Application Category

📝 Abstract
Most 3D object generators focus on aesthetic quality, often neglecting physical constraints necessary in applications. One such constraint is that the 3D object should be self-supporting, i.e., remains balanced under gravity. Prior approaches to generating stable 3D objects used differentiable physics simulators to optimize geometry at test-time, which is slow, unstable, and prone to local optima. Inspired by the literature on aligning generative models to external feedback, we propose Direct Simulation Optimization (DSO), a framework to use the feedback from a (non-differentiable) simulator to increase the likelihood that the 3D generator outputs stable 3D objects directly. We construct a dataset of 3D objects labeled with a stability score obtained from the physics simulator. We can then fine-tune the 3D generator using the stability score as the alignment metric, via direct preference optimization (DPO) or direct reward optimization (DRO), a novel objective, which we introduce, to align diffusion models without requiring pairwise preferences. Our experiments show that the fine-tuned feed-forward generator, using either DPO or DRO objective, is much faster and more likely to produce stable objects than test-time optimization. Notably, the DSO framework works even without any ground-truth 3D objects for training, allowing the 3D generator to self-improve by automatically collecting simulation feedback on its own outputs.
Problem

Research questions and friction points this paper is trying to address.

Align 3D generators with physical stability constraints
Improve 3D object self-supporting under gravity efficiently
Optimize generator outputs using non-differentiable simulator feedback
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses simulation feedback for 3D stability
Fine-tunes generator via DPO or DRO
Self-improves without ground-truth data
🔎 Similar Papers
No similar papers found.