LatentDiff: Scaling Semantic Dataset Comparison to Millions of Images

📅 2026-04-28

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the challenge of efficiently detecting extremely subtle semantic shifts—occurring at proportions below 1%—in large-scale image datasets. To this end, the authors propose LatentDiff, a novel framework that operates in the latent space of pretrained vision encoders by integrating sparse autoencoders with density ratio estimation to achieve highly sensitive and interpretable semantic discrepancy detection. LatentDiff is the first to incorporate sparse autoencoders into latent-space divergence analysis and introduces Noisy-Diff, a new benchmark designed to better reflect real-world conditions. Experimental results demonstrate that LatentDiff significantly outperforms existing image-caption-based methods under ultra-low shift ratios while substantially reducing computational overhead.

📝 Abstract

We present LatentDiff, a scalable framework for semantic dataset comparison that operates directly in the latent space of pretrained vision encoders. By combining sparse autoencoder-based divergence testing with density ratio estimation, LatentDiff identifies interpretable semantic differences between datasets at a fraction of the computational cost of caption-based alternatives. We also introduce Noisy-Diff, a benchmark capturing realistic sparse distribution shifts that cause existing methods to struggle. Experiments demonstrate that LatentDiff achieves superior accuracy while remaining robust to settings where an extremely small fraction of images (from 5% to <1% ) differ semantically.

Problem

Research questions and friction points this paper is trying to address.

semantic dataset comparison

distribution shift

latent space

scalability

sparse differences

Innovation

Methods, ideas, or system contributions that make the work stand out.

LatentDiff

semantic dataset comparison

latent space