Controlling Contrastive Self-Supervised Learning with Knowledge-Driven Multiple Hypothesis: Application to Beat Tracking

📅 2025-10-29

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Beat tracking exhibits inherent ambiguity: a single musical excerpt may correspond to multiple plausible beat sequences (e.g., due to inter-subject variability in rhythmic perception), which conventional supervised learning struggles to model. To address this, we propose a knowledge-guided multi-hypothesis contrastive self-supervised pretraining framework. Our approach features: (1) a music-theory-informed mechanism for generating multiple plausible beat hypotheses; (2) a domain-knowledge-driven scoring function to select high-fidelity positive samples, thereby enhancing rhythmic semantic discrimination in contrastive learning; and (3) explicit modeling of perceptual diversity in human beat perception during pretraining. After fine-tuning on standard beat tracking benchmarks, our model achieves state-of-the-art performance, significantly outperforming existing methods. This demonstrates that synergistic integration of structured musical knowledge and multi-hypothesis learning robustly improves representation learning for rhythm analysis.

Technology Category

Application Category

📝 Abstract

Ambiguities in data and problem constraints can lead to diverse, equally plausible outcomes for a machine learning task. In beat and downbeat tracking, for instance, different listeners may adopt various rhythmic interpretations, none of which would necessarily be incorrect. To address this, we propose a contrastive self-supervised pre-training approach that leverages multiple hypotheses about possible positive samples in the data. Our model is trained to learn representations compatible with different such hypotheses, which are selected with a knowledge-based scoring function to retain the most plausible ones. When fine-tuned on labeled data, our model outperforms existing methods on standard benchmarks, showcasing the advantages of integrating domain knowledge with multi-hypothesis selection in music representation learning in particular.

Problem

Research questions and friction points this paper is trying to address.

Addressing ambiguous data with multiple plausible outcomes

Leveraging domain knowledge for multi-hypothesis selection

Improving beat tracking through contrastive self-supervised learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses contrastive self-supervised learning with multiple hypotheses

Selects plausible hypotheses via knowledge-based scoring function

Learns representations compatible with diverse rhythmic interpretations

🔎 Similar Papers

No similar papers found.