đ¤ AI Summary
Detecting odor-related objects in historical artworks faces severe challenges due to sparse annotations and extreme class imbalanceâstemming from stylistic diversity and the need for fine-grained categorization. Method: We propose a synthetic data augmentation framework based on Latent Diffusion Models (LDMs). By fine-tuning a pre-trained LDM, we generate high-fidelity, semantically controllable images of odor-related objects. We further introduce multi-strategy diffusion-based augmentationâincluding conditional guidance, localized inpainting, and style alignmentâto effectively enrich underrepresented classes. Contribution/Results: Joint training on real and synthetic data significantly improves detection performance in low-resource settings, achieving substantial gains in mean Average Precision (mAP). Our approach demonstrates strong few-shot generalization and scalability for long-tailed domains such as cultural heritage analysis, where annotated data is scarce and class distributions are highly skewed.
đ Abstract
Finding smell references in historic artworks is a challenging problem. Beyond artwork-specific challenges such as stylistic variations, their recognition demands exceptionally detailed annotation classes, resulting in annotation sparsity and extreme class imbalance. In this work, we explore the potential of synthetic data generation to alleviate these issues and enable accurate detection of smell-related objects. We evaluate several diffusion-based augmentation strategies and demonstrate that incorporating synthetic data into model training can improve detection performance. Our findings suggest that leveraging the large-scale pretraining of diffusion models offers a promising approach for improving detection accuracy, particularly in niche applications where annotations are scarce and costly to obtain. Furthermore, the proposed approach proves to be effective even with relatively small amounts of data, and scaling it up provides high potential for further enhancements.