Towards Fusion of Neural Audio Codec-based Representations with Spectral for Heart Murmur Classification via Bandit-based Cross-Attention Mechanism

📅 2025-06-01

📈 Citations: 0

✨ Influential: 0

career value

256K/year

🤖 AI Summary

This work addresses the insufficient modeling of temporal dynamics and harmonic spectral structures in heart murmur classification (HMC). We propose a cross-modal learning framework that jointly leverages neural audio codec representations (NACRs, e.g., EnCodec) and handcrafted spectral features (SFs, e.g., MFCCs). Our key innovation is a bandwidth-aware cross-attention mechanism inspired by multi-armed bandits, which dynamically selects and reweights critical attention heads to suppress modality-specific noise while enhancing discriminative feature fusion. The method is fully end-to-end trainable without requiring auxiliary annotations. Evaluated on standard phonocardiogram datasets, it significantly outperforms unimodal baselines and conventional feature concatenation or weighted fusion approaches, establishing new state-of-the-art performance for HMC.

Technology Category

Application Category

📝 Abstract

In this study, we focus on heart murmur classification (HMC) and hypothesize that combining neural audio codec representations (NACRs) such as EnCodec with spectral features (SFs), such as MFCC, will yield superior performance. We believe such fusion will trigger their complementary behavior as NACRs excel at capturing fine-grained acoustic patterns such as rhythm changes, spectral features focus on frequency-domain properties such as harmonic structure, spectral energy distribution crucial for analyzing the complex of heart sounds. To this end, we propose, BAOMI, a novel framework banking on novel bandit-based cross-attention mechanism for effective fusion. Here, a agent provides more weightage to most important heads in multi-head cross-attention mechanism and helps in mitigating the noise. With BAOMI, we report the topmost performance in comparison to individual NACRs, SFs, and baseline fusion techniques and setting new state-of-the-art.

Problem

Research questions and friction points this paper is trying to address.

Classifying heart murmurs by fusing neural and spectral features

Improving fusion via bandit-based cross-attention for noise reduction

Achieving state-of-the-art performance in heart sound analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuses neural audio codec with spectral features

Uses bandit-based cross-attention for fusion

Optimizes multi-head attention weights dynamically

🔎 Similar Papers

Classification of Heart Sounds Using Multi-Branch Deep Convolutional Network and LSTM-CNN