🤖 AI Summary
To address low inference efficiency and poor multi-task generalization in image-to-image translation—such as object removal, normal/depth estimation, controllable relighting, and shadow generation—this paper proposes Latent Bridge Matching (LBM), the first adaptation of Bridge Matching to latent space for single-step, conditional diffusion-based image translation. Our key contributions are: (1) a latent-space bridge matching mechanism that aligns cross-domain distributions via optimal transport; (2) a lightweight conditional latent encoder-decoder architecture enabling fine-grained control; and (3) a unified framework supporting diverse translation tasks efficiently. Extensive experiments demonstrate state-of-the-art performance across multiple benchmarks, with single-step inference significantly faster than iterative diffusion or GAN-based methods. The code is publicly available.
📝 Abstract
In this paper, we introduce Latent Bridge Matching (LBM), a new, versatile and scalable method that relies on Bridge Matching in a latent space to achieve fast image-to-image translation. We show that the method can reach state-of-the-art results for various image-to-image tasks using only a single inference step. In addition to its efficiency, we also demonstrate the versatility of the method across different image translation tasks such as object removal, normal and depth estimation, and object relighting. We also derive a conditional framework of LBM and demonstrate its effectiveness by tackling the tasks of controllable image relighting and shadow generation. We provide an open-source implementation of the method at https://github.com/gojasper/LBM.