🤖 AI Summary
This work addresses fiber-optic acoustic recognition—a low-resource, domain-shift-intensive downstream task (e.g., gunshot and firework detection). We propose a support-set-driven adaptation framework for CLAP (Contrastive Language–Audio Pretraining). Methodologically, we introduce the first support-set linear interpolation adaptation mechanism, synergistically integrating fine-tuning’s implicit knowledge with memory-augmented explicit retrieval to bridge CLAP and few-shot domain adaptation. This mechanism substantially enhances model generalization across diverse fiber-optic environments and operating conditions. Our approach achieves state-of-the-art performance on both the laboratory-based fiber-optic ESC-50 benchmark and real-world gunshot/firework datasets collected via fiber-optic sensors. To foster reproducibility and community advancement, we publicly release both the source code and a newly curated fiber-optic acoustic dataset.
📝 Abstract
Contrastive Language-Audio Pretraining (CLAP) models have demonstrated unprecedented performance in various acoustic signal recognition tasks. Fiber-optic-based acoustic recognition is one of the most important downstream tasks and plays a significant role in environmental sensing. Adapting CLAP for fiber-optic acoustic recognition has become an active research area. As a non-conventional acoustic sensor, fiber-optic acoustic recognition presents a challenging, domain-specific, low-shot deployment environment with significant domain shifts due to unique frequency response and noise characteristics. To address these challenges, we propose a support-based adaptation method, CLAP-S, which linearly interpolates a CLAP Adapter with the Support Set, leveraging both implicit knowledge through fine-tuning and explicit knowledge retrieved from memory for cross-domain generalization. Experimental results show that our method delivers competitive performance on both laboratory-recorded fiber-optic ESC-50 datasets and a real-world fiber-optic gunshot-firework dataset. Our research also provides valuable insights for other downstream acoustic recognition tasks. The code and gunshot-firework dataset are available at https://github.com/Jingchensun/clap-s.