🤖 AI Summary
Diffusion models for 3D LiDAR scene completion suffer from slow sampling, and existing score distillation methods accelerate inference at the cost of degraded performance.
Method: This paper proposes Distillation-DPO, the first framework to integrate Direct Preference Optimization (DPO) into diffusion distillation. It constructs preference pairs using non-differentiable LiDAR evaluation metrics—such as Chamfer Distance and F-Score—to guide the student model to approximate the teacher’s score function. A joint optimization mechanism is designed, combining noise-pair generation with score-difference-driven learning.
Contribution/Results: By bypassing reliance on differentiable losses, Distillation-DPO enables end-to-end alignment with real-world evaluation objectives. Experiments demonstrate that our method achieves superior or comparable completion quality while accelerating inference by over 5×, significantly outperforming state-of-the-art diffusion-based approaches.
📝 Abstract
The application of diffusion models in 3D LiDAR scene completion is limited due to diffusion's slow sampling speed. Score distillation accelerates diffusion sampling but with performance degradation, while post-training with direct policy optimization (DPO) boosts performance using preference data. This paper proposes Distillation-DPO, a novel diffusion distillation framework for LiDAR scene completion with preference aligment. First, the student model generates paired completion scenes with different initial noises. Second, using LiDAR scene evaluation metrics as preference, we construct winning and losing sample pairs. Such construction is reasonable, since most LiDAR scene metrics are informative but non-differentiable to be optimized directly. Third, Distillation-DPO optimizes the student model by exploiting the difference in score functions between the teacher and student models on the paired completion scenes. Such procedure is repeated until convergence. Extensive experiments demonstrate that, compared to state-of-the-art LiDAR scene completion diffusion models, Distillation-DPO achieves higher-quality scene completion while accelerating the completion speed by more than 5-fold. Our method is the first to explore adopting preference learning in distillation to the best of our knowledge and provide insights into preference-aligned distillation. Our code is public available on https://github.com/happyw1nd/DistillationDPO.