🤖 AI Summary
To address the plasticity–stability trade-off in fine-tuning RGB-pretrained models for multimodal tracking, this paper proposes a sensitivity-aware regularization framework. We introduce, for the first time, tangent-space modeling of parameter prior sensitivity and jointly incorporate transfer sensitivity during cross-modal adaptation, establishing a dual-sensitivity regularization mechanism that preserves original knowledge stability while enhancing cross-modal adaptability. Our method integrates tangent-space sensitivity quantification, dynamic regularization, and cross-modal transfer learning to achieve balanced generalization and plasticity. Extensive experiments demonstrate state-of-the-art performance across multiple multimodal tracking benchmarks. The source code and pretrained models are publicly released.
📝 Abstract
This paper tackles the critical challenge of optimizing multi-modal trackers by effectively adapting the pre-trained models for RGB data. Existing fine-tuning paradigms oscillate between excessive freedom and over-restriction, both leading to a suboptimal plasticity-stability trade-off. To mitigate this dilemma, we propose a novel sensitivity-aware regularized tuning framework, which delicately refines the learning process by incorporating intrinsic parameter sensitivities. Through a comprehensive investigation from pre-trained to multi-modal contexts, we identify that parameters sensitive to pivotal foundational patterns and cross-domain shifts are primary drivers of this issue. Specifically, we first analyze the tangent space of pre-trained weights to measure and orient prior sensitivities, dedicated to preserving generalization. Then, we further explore transfer sensitivities during the tuning phase, emphasizing adaptability and stability. By incorporating these sensitivities as regularization terms, our method significantly enhances the transferability across modalities. Extensive experiments showcase the superior performance of the proposed method, surpassing current state-of-the-art techniques across various multi-modal tracking. The source code and models will be publicly available at https://github.com/zhiwen-xdu/SRTrack.