SADGE: Structure and Appearance Domain Gap Estimation of Synthetic and Real Data

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This study addresses the challenge of accurately predicting the effectiveness of synthetic image datasets on real-world vision tasks without retraining downstream models. The authors propose SADGE, a novel scoring method that systematically demonstrates—through extensive validation—that neither appearance nor geometric similarity alone suffices to predict transfer performance. Instead, SADGE innovatively models their nonlinear synergistic interaction via a constrained bilinear fusion mechanism. Appearance similarity is computed using DINOv3, while geometric consistency is assessed with MASt3R. Evaluated across five benchmarks and 15 dataset variants (comprising 79,000 image pairs), SADGE achieves Pearson correlation r = 0.88 and Spearman correlation ρ = 0.77 on object detection, semantic segmentation, and pose estimation tasks, significantly outperforming single-dimension baselines.

📝 Abstract

We propose SADGE, a quantitative similarity metric that predicts the performance of synthetic image datasets for common computer vision tasks without downstream model training. Estimating whether a synthetic dataset will lead to a model that performs well on real-world data remains a bottleneck in model development. Existing evaluation metrics (e.g., PSNR, FID, CLIP) primarily measure semantic alignment between real and synthetic images (Appearance Similarity Score). Less commonly, structural similarity between images is considered to assess the domain gap (Geometric Similarity Score). However, to the best of our knowledge there exists no studies that evaluate which similarity metric is the best downstream predictor for a given synthetic dataset. In this paper, we show over a wide variety of different synthetic datasets and downstream tasks that neither appearance nor geometry alone can reliably predict downstream performance; rather, it is their non-linear interplay that dictates synthetic data utility. Specifically, we measure how commonly used Appearance and Geometric Similarity metrics computed between synthetic and real images correlate with downstream performance in object detection, semantic segmentation, and pose estimation. Across five public synthetic-to-real benchmark families and 15 dataset-level variants (79k image pairs), SADGE achieves the strongest association with downstream transfer performance under both linear and rank-based criteria, reaching Pearson r=0.88 and Spearman rho=0.77. We compute for each combination of geometry-based methods and appearance-based approaches SADGE scores across all benchmark families. The best configuration is obtained by fusing DINOv3 appearance similarity with MASt3R geometric consistency through a constrained bilinear interaction, outperforming both the strongest geometry-only baseline and the strongest appearance-only baseline .

Problem

Research questions and friction points this paper is trying to address.

domain gap

synthetic data

appearance similarity

geometric similarity

downstream performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

SADGE

domain gap estimation

synthetic-to-real transfer