🤖 AI Summary
In semantic segmentation, vision foundation models suffer from scarce annotated data and source-target domain distribution shifts. To address these challenges, we propose Rein++, a novel framework featuring three key innovations: (1) instance-aware learnable tokens to enhance fine-grained feature representation; (2) a dual-level unsupervised domain adaptation mechanism operating at both instance and logit levels; and (3) a class-agnostic boundary detail transfer module that incorporates boundary priors from Segment Anything. Rein++ achieves efficient training on billion-parameter models by fine-tuning fewer than 1% of backbone parameters. Extensive experiments demonstrate significant improvements over state-of-the-art methods across multiple cross-domain segmentation benchmarks. The method exhibits strong generalization capability and robustness to illumination variations and scene changes, validating its effectiveness in real-world deployment scenarios.
📝 Abstract
Vision Foundation Models(VFMs) have achieved remarkable success in various computer vision tasks. However, their application to semantic segmentation is hindered by two significant challenges: (1) the disparity in data scale, as segmentation datasets are typically much smaller than those used for VFM pre-training, and (2) domain distribution shifts, where real-world segmentation scenarios are diverse and often underrepresented during pre-training. To overcome these limitations, we present Rein++, an efficient VFM-based segmentation framework that demonstrates superior generalization from limited data and enables effective adaptation to diverse unlabeled scenarios. Specifically, Rein++ comprises a domain generalization solution Rein-G and a domain adaptation solution Rein-A. Rein-G introduces a set of trainable, instance-aware tokens that effectively refine the VFM's features for the segmentation task. This parameter-efficient approach fine-tunes less than 1% of the backbone's parameters, enabling robust generalization. Building on the Rein-G, Rein-A performs unsupervised domain adaptation at both the instance and logit levels to mitigate domain shifts. In addition, it incorporates a semantic transfer module that leverages the class-agnostic capabilities of the segment anything model to enhance boundary details in the target domain. The integrated Rein++ pipeline first learns a generalizable model on a source domain (e.g., daytime scenes) and subsequently adapts it to diverse target domains (e.g., nighttime scenes) without any target labels. Comprehensive experiments demonstrate that Rein++ significantly outperforms state-of-the-art methods with efficient training, underscoring its roles an efficient, generalizable, and adaptive segmentation solution for VFMs, even for large models with billions of parameters. The code is available at https://github.com/wloves/Rein.