🤖 AI Summary
In specialized domains (e.g., medicine, biology, astronomy), teaching visual discrimination is hindered by ambiguous category boundaries, sparse and unpaired samples, and subtle visual differences that resist textual description.
Method: This paper proposes a counterfactual visualization learning framework based on diffusion models. Its core innovation is the first realization of disentangled manipulation in conditional latent space—separating category structure from instance identity—enabling high-fidelity, semantically controllable inter-class transition image generation without paired data.
Contribution/Results: The method successfully models fine-grained discriminative features across six professional domains and supports interpretable difference localization. User studies demonstrate significant improvements over real-sample-only baselines: +18.7% in novice discrimination accuracy and markedly increased subjective confidence (p < 0.01). It establishes a novel paradigm for cultivating domain-specific visual literacy under data-scarce conditions.
📝 Abstract
Human expertise depends on the ability to recognize subtle visual differences, such as distinguishing diseases, species, or celestial phenomena. We propose a new method to teach novices how to differentiate between nuanced categories in specialized domains. Our method uses generative models to visualize the minimal change in features to transition between classes, i.e., counterfactuals, and performs well even in domains where data is sparse, examples are unpaired, and category boundaries are not easily explained by text. By manipulating the conditioning space of diffusion models, our proposed method DIFFusion disentangles category structure from instance identity, enabling high-fidelity synthesis even in challenging domains. Experiments across six domains show accurate transitions even with limited and unpaired examples across categories. User studies confirm that our generated counterfactuals outperform unpaired examples in teaching perceptual expertise, showing the potential of generative models for specialized visual learning.