Rein++: Efficient Generalization and Adaptation for Semantic Segmentation with Vision Foundation Models

📅 2025-08-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In semantic segmentation, vision foundation models suffer from scarce annotated data and source-target domain distribution shifts. To address these challenges, we propose Rein++, a novel framework featuring three key innovations: (1) instance-aware learnable tokens to enhance fine-grained feature representation; (2) a dual-level unsupervised domain adaptation mechanism operating at both instance and logit levels; and (3) a class-agnostic boundary detail transfer module that incorporates boundary priors from Segment Anything. Rein++ achieves efficient training on billion-parameter models by fine-tuning fewer than 1% of backbone parameters. Extensive experiments demonstrate significant improvements over state-of-the-art methods across multiple cross-domain segmentation benchmarks. The method exhibits strong generalization capability and robustness to illumination variations and scene changes, validating its effectiveness in real-world deployment scenarios.

Technology Category

Application Category

📝 Abstract
Vision Foundation Models(VFMs) have achieved remarkable success in various computer vision tasks. However, their application to semantic segmentation is hindered by two significant challenges: (1) the disparity in data scale, as segmentation datasets are typically much smaller than those used for VFM pre-training, and (2) domain distribution shifts, where real-world segmentation scenarios are diverse and often underrepresented during pre-training. To overcome these limitations, we present Rein++, an efficient VFM-based segmentation framework that demonstrates superior generalization from limited data and enables effective adaptation to diverse unlabeled scenarios. Specifically, Rein++ comprises a domain generalization solution Rein-G and a domain adaptation solution Rein-A. Rein-G introduces a set of trainable, instance-aware tokens that effectively refine the VFM's features for the segmentation task. This parameter-efficient approach fine-tunes less than 1% of the backbone's parameters, enabling robust generalization. Building on the Rein-G, Rein-A performs unsupervised domain adaptation at both the instance and logit levels to mitigate domain shifts. In addition, it incorporates a semantic transfer module that leverages the class-agnostic capabilities of the segment anything model to enhance boundary details in the target domain. The integrated Rein++ pipeline first learns a generalizable model on a source domain (e.g., daytime scenes) and subsequently adapts it to diverse target domains (e.g., nighttime scenes) without any target labels. Comprehensive experiments demonstrate that Rein++ significantly outperforms state-of-the-art methods with efficient training, underscoring its roles an efficient, generalizable, and adaptive segmentation solution for VFMs, even for large models with billions of parameters. The code is available at https://github.com/wloves/Rein.
Problem

Research questions and friction points this paper is trying to address.

Bridges data scale gap for VFM-based semantic segmentation
Addresses domain shifts in diverse segmentation scenarios
Enables efficient adaptation without target domain labels
Innovation

Methods, ideas, or system contributions that make the work stand out.

Trainable instance-aware tokens refine VFM features
Parameter-efficient fine-tuning under 1% backbone parameters
Unsupervised domain adaptation at instance and logit levels
🔎 Similar Papers
No similar papers found.
Z
Zhixiang Wei
University of Science and Technology of China
Xiaoxiao Ma
Xiaoxiao Ma
Oracle, Macquarie University
LLMdeep generative modelsanomaly detectiongraph neural networks
R
Ruishen Yan
University of Science and Technology of China
Tao Tu
Tao Tu
Columbia University, Google
multi-modal neuroimagingmachine learningneural information processing
H
Huaian Chen
University of Science and Technology of China
J
Jinjin Zheng
University of Science and Technology of China
Y
Yi Jin
University of Science and Technology of China
Enhong Chen
Enhong Chen
University of Science and Technology of China
data miningrecommender systemmachine learning