Weakly Supervised Multiple Instance Learning for Whale Call Detection and Localization in Long-Duration Passive Acoustic Monitoring

📅 2025-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Accurate detection and localization of cetacean calls in long-term passive acoustic monitoring (PAM) remain challenging due to the absence of precise temporal annotations and the computational difficulty of processing audio segments lasting 2–30 minutes. Method: We propose a weakly supervised dual-stream DSMIL-LocNet framework that operates solely on bag-level labels. It integrates time-frequency spectral modeling with a dual-stream attention mechanism and introduces an attention-based instance selection strategy, enabling joint optimization of long-context classification and medium-length instance localization. Contribution/Results: This work presents the first large-scale validation of multi-instance learning (MIL) scalability in marine PAM. Evaluated on an Antarctic cetacean dataset, our method achieves detection F1-scores of 0.8–0.9 and localization accuracy of 0.65–0.70, significantly reducing annotation effort while enabling end-to-end analysis of long-duration audio.

Technology Category

Application Category

📝 Abstract
Marine ecosystem monitoring via Passive Acoustic Monitoring (PAM) generates vast data, but deep learning often requires precise annotations and short segments. We introduce DSMIL-LocNet, a Multiple Instance Learning framework for whale call detection and localization using only bag-level labels. Our dual-stream model processes 2-30 minute audio segments, leveraging spectral and temporal features with attention-based instance selection. Tests on Antarctic whale data show longer contexts improve classification (F1: 0.8-0.9) while medium instances ensure localization precision (0.65-0.70). This suggests MIL can enhance scalable marine monitoring. Code: https://github.com/Ragib-Amin-Nihal/DSMIL-Loc
Problem

Research questions and friction points this paper is trying to address.

Detects and localizes whale calls using weak supervision
Processes long audio segments with minimal annotations
Improves marine monitoring scalability with MIL framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Multiple Instance Learning for whale detection
Processes long audio segments with dual-stream model
Leverages spectral and temporal features with attention
🔎 Similar Papers
No similar papers found.