🤖 AI Summary
To address data scarcity and high annotation costs in semantic segmentation, this paper proposes Concept-Aware LoRA (CA-LoRA), which synthesizes high-quality, domain-aligned labeled data via text-to-image generation models. CA-LoRA dynamically identifies and updates only the low-rank parameters associated with critical domain concepts—such as viewpoint and style—thereby achieving both target-domain alignment and preservation of pretrained knowledge, overcoming overfitting and poor generalization inherent in conventional fine-tuning. It introduces a concept importance scoring mechanism and a text-guided domain alignment generation strategy. Extensive experiments under challenging conditions—including adverse weather and drastic illumination changes—demonstrate its robustness. On urban scene segmentation tasks, CA-LoRA consistently outperforms baselines and state-of-the-art methods across few-shot, fully supervised, and cross-domain settings, achieving superior mIoU—particularly under domain shifts where it maintains significant performance gains.
📝 Abstract
This paper addresses the challenge of data scarcity in semantic segmentation by generating datasets through text-to-image (T2I) generation models, reducing image acquisition and labeling costs. Segmentation dataset generation faces two key challenges: 1) aligning generated samples with the target domain and 2) producing informative samples beyond the training data. Fine-tuning T2I models can help generate samples aligned with the target domain. However, it often overfits and memorizes training data, limiting their ability to generate diverse and well-aligned samples. To overcome these issues, we propose Concept-Aware LoRA (CA-LoRA), a novel fine-tuning approach that selectively identifies and updates only the weights associated with necessary concepts (e.g., style or viewpoint) for domain alignment while preserving the pretrained knowledge of the T2I model to produce informative samples. We demonstrate its effectiveness in generating datasets for urban-scene segmentation, outperforming baseline and state-of-the-art methods in in-domain (few-shot and fully-supervised) settings, as well as in domain generalization tasks, especially under challenging conditions such as adverse weather and varying illumination, further highlighting its superiority.