🤖 AI Summary
To address distribution shifts and spurious texture generation in single-image deblurring (SIDD) caused by point spread function (PSF) heterogeneity during cross-lens and cross-scene testing, this paper proposes the first pixel-level regression-oriented continual test-time adaptation (CTTA) framework. Methodologically, we design a causal Siamese network architecture that incorporates vision-language models (e.g., CLIP) to extract semantic priors, thereby modeling causally identifiable relationships between blurry inputs and sharp reconstructions—enabling online adaptation solely from unlabeled target-domain data. Experiments demonstrate that our approach significantly enhances the generalization of mainstream SIDD models under dynamic target domains, achieving state-of-the-art performance across multiple cross-lens and cross-scene benchmarks. Crucially, it effectively mitigates structural distortions and spurious textures under severe blur, without requiring access to source-domain data or ground-truth labels during adaptation.
📝 Abstract
Single image defocus deblurring (SIDD) aims to restore an all-in-focus image from a defocused one. Distribution shifts in defocused images generally lead to performance degradation of existing methods during out-of-distribution inferences. In this work, we gauge the intrinsic reason behind the performance degradation, which is identified as the heterogeneity of lens-specific point spread functions. Empirical evidence supports this finding, motivating us to employ a continual test-time adaptation (CTTA) paradigm for SIDD. However, traditional CTTA methods, which primarily rely on entropy minimization, cannot sufficiently explore task-dependent information for pixel-level regression tasks like SIDD. To address this issue, we propose a novel Siamese networks-based continual test-time adaptation framework, which adapts source models to continuously changing target domains only requiring unlabeled target data in an online manner. To further mitigate semantically erroneous textures introduced by source SIDD models under severe degradation, we revisit the learning paradigm through a structural causal model and propose Causal Siamese networks (CauSiam). Our method leverages large-scale pre-trained vision-language models to derive discriminative universal semantic priors and integrates these priors into Siamese networks, ensuring causal identifiability between blurry inputs and restored images. Extensive experiments demonstrate that CauSiam effectively improves the generalization performance of existing SIDD methods in continuously changing domains.