🤖 AI Summary
This work addresses the inefficiency of retraining separate models for arbitrary cross-domain image translation—bidirectional, one-to-many, and many-to-one—in semi-supervised multi-domain settings. We propose a unified conditional diffusion framework that assigns each domain an independent noise schedule; missing domains are naturally modeled as high-noise states, enabling semantic-guided progressive reconstruction across domains. Key technical contributions include multi-level domain-specific noise scheduling, a semi-supervised loss, and a cross-domain noise masking mechanism. Evaluated on multi-domain synthetic benchmarks, our method significantly improves generalization under low-label regimes and enhances configuration flexibility. Notably, it enables complex semantic domain inversion—e.g., reversing translation directions—without retraining, a capability unattainable with conventional multi-model paradigms. This breakthrough overcomes longstanding efficiency and scalability bottlenecks in multi-domain image translation.
📝 Abstract
Domain-to-domain translation involves generating a target domain sample given a condition in the source domain. Most existing methods focus on fixed input and output domains, i.e. they only work for specific configurations (i.e. for two domains, either $D_1
ightarrow{}D_2$ or $D_2
ightarrow{}D_1$). This paper proposes Multi-Domain Diffusion (MDD), a conditional diffusion framework for multi-domain translation in a semi-supervised context. Unlike previous methods, MDD does not require defining input and output domains, allowing translation between any partition of domains within a set (such as $(D_1, D_2)
ightarrow{}D_3$, $D_2
ightarrow{}(D_1, D_3)$, $D_3
ightarrow{}D_1$, etc. for 3 domains), without the need to train separate models for each domain configuration. The key idea behind MDD is to leverage the noise formulation of diffusion models by incorporating one noise level per domain, which allows missing domains to be modeled with noise in a natural way. This transforms the training task from a simple reconstruction task to a domain translation task, where the model relies on less noisy domains to reconstruct more noisy domains. We present results on a multi-domain (with more than two domains) synthetic image translation dataset with challenging semantic domain inversion.