🤖 AI Summary
This work addresses the domain rigidity and annotation dependency inherent in static training for camouflaged object detection by introducing test-time adaptation to this task for the first time. The proposed hierarchical consistency learning framework dynamically calibrates features without requiring annotations through three key mechanisms: hierarchical representation reconstruction, task-affinity guidance, and prototype consistency calibration. To mitigate feature entanglement while preserving semantic invariance, the method integrates spatial–frequency dual-stream decomposition with channel-wise affinity propagation. Extensive experiments across four camouflaged and four underwater object datasets under three degradation settings demonstrate that the approach significantly outperforms state-of-the-art models, substantially enhancing generalization and robustness under distribution shifts.
📝 Abstract
Camouflaged object detection (COD) aims to localize targets that exhibit minimal perceptual differences from backgrounds through physical attributes. Existing methods, constrained by the static train-then-freeze paradigm, suffer from domain rigidity and annotation dependency, limiting their adaptability to scene variations and unseen camouflage patterns. To overcome these, we propose the hierarchical consistency learning (HCL) framework, which integrates test-time adaptation for dynamic representation recalibration. Specifically, we design the hierarchical representation reconstruction (HRR) to alleviate feature entanglement by synergizing spatial reconstruction with dual-stream frequency-domain decomposition, enhancing robustness against appearance homogenization. The pixel and spectrum inference provide structural and contextual priors. We further introduce task affinity guidance (TAG) to propagate knowledge across branches via channel-wise affinity, aligning local discriminative cues and mitigating semantic drift. To ensure semantic invariance, we formulate the prototype consistency calibration (PCC), which aggregates region features into compact prototypes and establishes prototype-feature similarity. This imposes implicit and hierarchical constraints that bridge task and representation gaps. Extensive experiments across four camouflaged and four underwater object benchmarks, under three degradation settings, demonstrate that our method consistently outperforms state-of-the-art approaches, highlighting its robustness and generalization under distribution shifts.