🤖 AI Summary
To address the insufficient robustness of 3D perception in autonomous driving under dynamic, unknown test scenarios, existing test-time adaptation (TTA) methods suffer from optimization instability and susceptibility to sharp minima, while linear mode connectivity (LMC)-based model fusion incurs prohibitive computational overhead. This paper proposes CodeMerge, a lightweight codebook-guided model fusion framework. CodeMerge introduces low-dimensional latent fingerprints and a key-value codebook to enable adaptive latent-space fusion, dynamically weighted by ridge leverage scores—thereby eliminating redundant model loading and repeated forward passes. Crucially, it implicitly models linear mode connectivity without explicit interpolation. On nuScenes-C, CodeMerge improves NDS by 14.9%; in nuScenes→KITTI domain transfer, it boosts detection mAP by 7.6%. Moreover, it significantly enhances downstream tasks including online mapping, motion forecasting, and motion planning.
📝 Abstract
Maintaining robust 3D perception under dynamic and unpredictable test-time conditions remains a critical challenge for autonomous driving systems. Existing test-time adaptation (TTA) methods often fail in high-variance tasks like 3D object detection due to unstable optimization and sharp minima. While recent model merging strategies based on linear mode connectivity (LMC) offer improved stability by interpolating between fine-tuned checkpoints, they are computationally expensive, requiring repeated checkpoint access and multiple forward passes. In this paper, we introduce CodeMerge, a lightweight and scalable model merging framework that bypasses these limitations by operating in a compact latent space. Instead of loading full models, CodeMerge represents each checkpoint with a low-dimensional fingerprint derived from the source model's penultimate features and constructs a key-value codebook. We compute merging coefficients using ridge leverage scores on these fingerprints, enabling efficient model composition without compromising adaptation quality. Our method achieves strong performance across challenging benchmarks, improving end-to-end 3D detection 14.9% NDS on nuScenes-C and LiDAR-based detection by over 7.6% mAP on nuScenes-to-KITTI, while benefiting downstream tasks such as online mapping, motion prediction and planning even without training. Code and pretrained models are released in the supplementary material.