SADGE: Structure and Appearance Domain Gap Estimation of Synthetic and Real Data

📅 2026-05-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

199K/year
🤖 AI Summary
This study addresses the challenge of accurately predicting the effectiveness of synthetic image datasets on real-world vision tasks without retraining downstream models. The authors propose SADGE, a novel scoring method that systematically demonstrates—through extensive validation—that neither appearance nor geometric similarity alone suffices to predict transfer performance. Instead, SADGE innovatively models their nonlinear synergistic interaction via a constrained bilinear fusion mechanism. Appearance similarity is computed using DINOv3, while geometric consistency is assessed with MASt3R. Evaluated across five benchmarks and 15 dataset variants (comprising 79,000 image pairs), SADGE achieves Pearson correlation r = 0.88 and Spearman correlation ρ = 0.77 on object detection, semantic segmentation, and pose estimation tasks, significantly outperforming single-dimension baselines.
📝 Abstract
We propose SADGE, a quantitative similarity metric that predicts the performance of synthetic image datasets for common computer vision tasks without downstream model training. Estimating whether a synthetic dataset will lead to a model that performs well on real-world data remains a bottleneck in model development. Existing evaluation metrics (e.g., PSNR, FID, CLIP) primarily measure semantic alignment between real and synthetic images (Appearance Similarity Score). Less commonly, structural similarity between images is considered to assess the domain gap (Geometric Similarity Score). However, to the best of our knowledge there exists no studies that evaluate which similarity metric is the best downstream predictor for a given synthetic dataset. In this paper, we show over a wide variety of different synthetic datasets and downstream tasks that neither appearance nor geometry alone can reliably predict downstream performance; rather, it is their non-linear interplay that dictates synthetic data utility. Specifically, we measure how commonly used Appearance and Geometric Similarity metrics computed between synthetic and real images correlate with downstream performance in object detection, semantic segmentation, and pose estimation. Across five public synthetic-to-real benchmark families and 15 dataset-level variants (79k image pairs), SADGE achieves the strongest association with downstream transfer performance under both linear and rank-based criteria, reaching Pearson r=0.88 and Spearman rho=0.77. We compute for each combination of geometry-based methods and appearance-based approaches SADGE scores across all benchmark families. The best configuration is obtained by fusing DINOv3 appearance similarity with MASt3R geometric consistency through a constrained bilinear interaction, outperforming both the strongest geometry-only baseline and the strongest appearance-only baseline .
Problem

Research questions and friction points this paper is trying to address.

domain gap
synthetic data
appearance similarity
geometric similarity
downstream performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

SADGE
domain gap estimation
synthetic-to-real transfer
appearance-geometry interaction
zero-shot dataset evaluation
🔎 Similar Papers
2024-07-13IEEE International Conference on Systems, Man and CyberneticsCitations: 0