🤖 AI Summary
In music information retrieval, rare instruments (e.g., harp, English horn) suffer from significantly degraded recognition performance due to severe label scarcity compared to mainstream instruments. To address this, we propose a hierarchical deep learning framework integrating the Hornbostel-Sachs classification system. Our model jointly models coarse-grained instrument families and fine-grained individual instruments, employing a hierarchy-aware loss function and leveraging transfer learning to mitigate fine-grained label insufficiency. Experiments on the MedleyDB dataset demonstrate that the proposed method achieves greater stability in coarse-level detection and reduces the F1-score gap between mainstream and rare instruments by 38.2%. This substantially improves robustness and generalization for detecting activity of rare instruments, particularly under limited supervision.
📝 Abstract
Identifying instrument activities within audio excerpts is vital in music information retrieval, with significant implications for music cataloging and discovery. Prior deep learning endeavors in musical instrument recognition have predominantly emphasized instrument classes with ample data availability. Recent studies have demonstrated the applicability of hierarchical classification in detecting instrument activities in orchestral music, even with limited fine-grained annotations at the instrument level. Based on the Hornbostel-Sachs classification, such a hierarchical classification system is evaluated using the MedleyDB dataset, renowned for its diversity and richness concerning various instruments and music genres. This work presents various strategies to integrate hierarchical structures into models and tests a new class of models for hierarchical music prediction. This study showcases more reliable coarse-level instrument detection by bridging the gap between detailed instrument identification and group-level recognition, paving the way for further advancements in this domain.