Mitigating Long-Tail Bias via Prompt-Controlled Diffusion Augmentation

📅 2026-02-04

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the severe long-tailed pixel imbalance in high-resolution remote sensing image semantic segmentation, further exacerbated by inconsistent class distributions between the Urban and Rural domains in the LoveDA dataset, which induces significant model bias. To tackle this challenge, the authors propose a prompt-controlled diffusion augmentation framework that first generates semantic layouts adhering to specified class proportions and domain attributes via a domain-aware masked-ratio conditioned discrete diffusion model, and then synthesizes photorealistic images using a ControlNet-guided Stable Diffusion pipeline. This approach enables, for the first time, explicit joint control over both class distribution and domain characteristics in synthetic samples. Extensive experiments demonstrate consistent and substantial improvements in minority-class performance and cross-domain generalization across multiple segmentation backbones, with particularly notable gains in Urban-to-Rural and Rural-to-Urban transfer scenarios.

Technology Category

Application Category

📝 Abstract

Semantic segmentation of high-resolution remote-sensing imagery is critical for urban mapping and land-cover monitoring, yet training data typically exhibits severe long-tailed pixel imbalance. In the dataset LoveDA, this challenge is compounded by an explicit Urban/Rural split with distinct appearance and inconsistent class-frequency statistics across domains. We present a prompt-controlled diffusion augmentation framework that synthesizes paired label--image samples with explicit control of both domain and semantic composition. Stage~A uses a domain-aware, masked ratio-conditioned discrete diffusion model to generate layouts that satisfy user-specified class-ratio targets while respecting learned co-occurrence structure. Stage~B translates layouts into photorealistic, domain-consistent images using Stable Diffusion with ControlNet guidance. Mixing the resulting ratio and domain-controlled synthetic pairs with real data yields consistent improvements across multiple segmentation backbones, with gains concentrated on minority classes and improved Urban and Rural generalization, demonstrating controllable augmentation as a practical mechanism to mitigate long-tail bias in remote-sensing segmentation. Source codes, pretrained models, and synthetic datasets are available at \href{https://github.com/Buddhi19/SyntheticGen.git}{Github}

Problem

Research questions and friction points this paper is trying to address.

long-tail bias

semantic segmentation

remote-sensing imagery

class imbalance

domain shift

Innovation

Methods, ideas, or system contributions that make the work stand out.

prompt-controlled diffusion

long-tail bias mitigation

semantic segmentation