🤖 AI Summary
Image-to-image (I2I) translation lacks systematic characterization and evaluation of content preservation, hindering task definition, model selection, and performance assessment.
Method: We propose the first taxonomy of I2I tasks based on content preservation rigidity—categorized as full, partial, and non-preservation—and design a unified evaluation framework covering 70 models, 30 tasks/datasets, and over 10 quantitative metrics (including FID, LPIPS, SSIM, and semantic consistency). Integrating GANs, VAEs, diffusion models, and disentangled representation learning, we formulate application-driven model selection principles.
Contribution/Results: We release the Sim-to-Real Translation Benchmark—the first comprehensive I2I benchmark explicitly focused on content preservation—and conduct large-scale, reproducible cross-model benchmarking. This work establishes theoretical foundations and practical standards for I2I task formulation, model selection, and evaluation.
📝 Abstract
Image-to-image translation (I2I) transforms an image from a source domain to a target domain while preserving source content. Most computer vision applications are in the field of image-to-image translation, such as style transfer, image segmentation, and photo enhancement. The degree of preservation of the content of the source images in the translation process can be different according to the problem and the intended application. From this point of view, in this paper, we divide the different tasks in the field of image-to-image translation into three categories: Fully Content preserving, Partially Content preserving, and Non-Content preserving. We present different tasks, datasets, methods, results of methods for these three categories in this paper. We make a categorization for I2I methods based on the architecture of different models and study each category separately. In addition, we introduce well-known evaluation criteria in the I2I translation field. Specifically, nearly 70 different I2I models were analyzed, and more than 10 quantitative evaluation metrics and 30 distinct tasks and datasets relevant to the I2I translation problem were both introduced and assessed. Translating from simulation to real images could be well viewed as an application of fully content preserving or partially content preserving unsupervised image-to-image translation methods. So, we provide a benchmark for Sim-to-Real translation, which can be used to evaluate different methods. In general, we conclude that because of the different extent of the obligation to preserving content in various applications, it is better to consider this issue in choosing a suitable I2I model for a specific application.