🤖 AI Summary
This work addresses the statistical inefficiency of diffusion models in multi-objective tasks caused by the scarcity of paired annotations. To mitigate this, the authors propose a semi-supervised two-stage learning framework: first, a lightweight expert model is trained on a small set of paired data; then, its knowledge is distilled into a general-purpose model via generated pseudo-samples, substantially reducing reliance on labeled data. Theoretically, the study establishes the first generalization bound for multi-objective learning with diffusion models, showing that the required number of paired samples depends only on the complexity of the expert model, and extends this analysis to settings with distribution shift. Experiments on robotic control and image restoration tasks demonstrate that the method achieves Pareto-optimal performance while drastically lowering annotation requirements.
📝 Abstract
Diffusion models are increasingly used as powerful conditional generators, yet real deployments often involve multiple target distributions arising from different tasks, e.g., diverse prompt domains in text-to-image generation, or multiple environments in robotics with diffusion policies. This naturally leads to a multi-objective learning (MOL) problem. A key challenge is that achieving good Pareto trade-offs can require a generalist model class with substantially larger capacity than what suffices for solving any individual task, thereby increasing statistical cost since sample complexity typically scales with the model complexity. To reconcile this, we develop a principled MOL framework for diffusion models with limited data: a semi-supervised regime where paired (labeled) samples are scarce, but (unlabeled) condition data are abundant. We propose a two-stage training procedure that first fits lightweight specialist models from limited paired data, and then distills them into a generalist model by generating pseudo-samples. We establish generalization bounds showing that the required number of paired samples only depends on the complexity of the specialist model classes. We further extend the theory to diffusion policies for sequential decision making to account for distribution shift in on-policy rollouts. Extensive experiments on robotic control and image restoration tasks are conducted to verify our theoretical results.