Adaptive Evidence Weighting for Audio-Spatiotemporal Fusion

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This work addresses the challenge in bioacoustic classification where the reliability and informativeness of multi-source evidence—such as audio signals and spatiotemporal context—vary across samples, rendering traditional fixed-weight fusion strategies suboptimal. To this end, we propose FINCH, a framework that adaptively fuses predictions from a pretrained audio classifier and a structured spatiotemporal predictor via a learnable, sample-level gating mechanism. FINCH explicitly constrains the influence of contextual information and incorporates a built-in audio fallback to ensure risk-controlled inference. By dynamically weighting evidence based on uncertainty and information content, the method subsumes pure audio models as a special case within its fusion family, enhancing both robustness and interpretability. Experiments demonstrate that FINCH achieves state-of-the-art performance on the CBI dataset and matches or outperforms existing approaches across multiple BirdSet subsets, significantly surpassing fixed-weight fusion and audio-only baselines.

Technology Category

Application Category

📝 Abstract

Many machine learning systems have access to multiple sources of evidence for the same prediction target, yet these sources often differ in reliability and informativeness across inputs. In bioacoustic classification, species identity may be inferred both from the acoustic signal and from spatiotemporal context such as location and season; while Bayesian inference motivates multiplicative evidence combination, in practice we typically only have access to discriminative predictors rather than calibrated generative models. We introduce \textbf{F}usion under \textbf{IN}dependent \textbf{C}onditional \textbf{H}ypotheses (\textbf{FINCH}), an adaptive log-linear evidence fusion framework that integrates a pre-trained audio classifier with a structured spatiotemporal predictor. FINCH learns a per-sample gating function that estimates the reliability of contextual information from uncertainty and informativeness statistics. The resulting fusion family \emph{contains} the audio-only classifier as a special case and explicitly bounds the influence of contextual evidence, yielding a risk-contained hypothesis class with an interpretable audio-only fallback. Across benchmarks, FINCH consistently outperforms fixed-weight fusion and audio-only baselines, improving robustness and error trade-offs even when contextual information is weak in isolation. We achieve state-of-the-art performance on CBI and competitive or improved performance on several subsets of BirdSet using a lightweight, interpretable, evidence-based approach. Code is available: \texttt{\href{https://anonymous.4open.science/r/birdnoise-85CD/README.md}{anonymous-repository}}

Problem

Research questions and friction points this paper is trying to address.

bioacoustic classification

evidence fusion

spatiotemporal context

multi-source evidence

reliability weighting

Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive fusion

evidence weighting

spatiotemporal context