🤖 AI Summary
This work addresses the underutilization of subspace structures in vision foundation models and the limited representational diversity and parameter efficiency of LoRA in domain-generalized semantic segmentation. To this end, the authors propose a dual-LoRA adaptation mechanism guided by rank-revealing QR (RRQR) decomposition. By explicitly uncovering dominant and subordinate subspace directions within pretrained models via RRQR, the method employs an auxiliary LoRA to fine-tune dominant directions while a primary LoRA learns subordinate directions, thereby enhancing feature diversity without requiring additional regularization or architectural complexity. Notably, this is the first approach to leverage RRQR for initializing LoRA modules. The proposed method achieves state-of-the-art performance on both synthetic-to-real and real-to-real cross-domain segmentation tasks while maintaining computational efficiency at inference time.
📝 Abstract
Domain Generalized Semantic Segmentation (DGSS) aims to maintain robust performance across unseen target domains. Vision Foundation Models (VFMs) offer rich multi-domain knowledge that can enhance generalization. However, strategies for actively exploiting the rich subspace structures within VFMs remain under-explored, with many existing methods focusing primarily on preserving pre-trained knowledge. Furthermore, their LoRA components often suffer from limited representational diversity and inefficient parameter utilization. We propose RecycleLoRA, which addresses both challenges by employing Rank-Revealing QR Decomposition (RRQR) to systematically exploit VFM's subspace structures and enhance LoRA's representational richness. Our main adapter leverages minor subspace directions identified by RRQR to learn diverse and independent features, achieving competitive performance even when used alone. We further introduce a sub adapter that carefully refines major directions with minimal adjustments, providing complementary improvements to the main adapter's strong baseline performance. This design enables the dual adapters to learn distinct representations without requiring additional regularization losses. Our systematic exploitation of pre-trained subspace structures through RRQR-based initialization leads to superior domain generalization performance. RecycleLoRA achieves state-of-the-art performance on both synthetic-to-real generalization and real-to-real generalization tasks without complex architectures or additional inference latency.