Model Merging Improves Zero-Shot Generalization in Bioacoustic Foundation Models

📅 2025-11-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
NatureLM, when fine-tuned for bioacoustics, improves benchmark performance but suffers a severe degradation in complex instruction following—e.g., accuracy drops sharply when required to output both common and scientific names—revealing an inherent trade-off between domain adaptation and instruction generalization. To address this, we propose a parameter-space interpolation fusion strategy: linearly interpolating the fine-tuned NatureLM with its base language model, augmented by a structured instruction prompting mechanism. This jointly preserves domain-specific knowledge while restoring zero-shot instruction comprehension. Our approach achieves over 200% relative improvement on cross-species closed-set zero-shot classification, establishing a new state-of-the-art. Crucially, it is the first method to systematically mitigate instruction flexibility degradation induced by fine-tuning—without compromising domain expertise.

Technology Category

Application Category

📝 Abstract
Foundation models capable of generalizing across species and tasks represent a promising new frontier in bioacoustics, with NatureLM being one of the most prominent examples. While its domain-specific fine-tuning yields strong performance on bioacoustic benchmarks, we observe that it also introduces trade-offs in instruction-following flexibility. For instance, NatureLM achieves high accuracy when prompted for either the common or scientific name individually, but its accuracy drops significantly when both are requested in a single prompt. We address this by applying a simple model merging strategy that interpolates NatureLM with its base language model, recovering instruction-following capabilities with minimal loss of domain expertise. Finally, we show that the merged model exhibits markedly stronger zero-shot generalization, achieving over a 200% relative improvement and setting a new state-of-the-art in closed-set zero-shot classification of unseen species.
Problem

Research questions and friction points this paper is trying to address.

Bioacoustic models lose instruction-following flexibility after domain-specific fine-tuning
Model accuracy drops significantly when handling complex multi-part prompts simultaneously
Specialized models show limited zero-shot generalization capability on unseen species
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model merging strategy improves zero-shot generalization
Interpolates domain-specific model with base language model
Recovers instruction-following while maintaining domain expertise
🔎 Similar Papers
2024-09-15IEEE International Conference on Acoustics, Speech, and Signal ProcessingCitations: 0