PARROT: Synergizing Mamba and Attention-based SSL Pre-Trained Models via Parallel Branch Hadamard Optimal Transport for Speech Emotion Recognition

๐Ÿ“… 2025-06-01
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the limited modeling capacity of single self-supervised pretraining models in speech emotion recognition (SER), this paper proposes the first heterogeneous collaborative framework integrating Mamba and Transformer attention mechanisms. Departing from mainstream homogeneous attention-based fusion paradigms, our method employs a parallel dual-branch architecture, introduces Hadamard product-based fine-grained feature interaction, and leverages optimal transport to align heterogeneous representation spaces. This work provides the first empirical validation of the complementary inductive biases between Mambaโ€™s efficient sequential modeling and Transformer attentionโ€™s capability in capturing long-range dependencies. Evaluated on standard SER benchmarks, the framework achieves state-of-the-art performance, improving accuracy by 3.2โ€“5.8 percentage points over individual models, homogeneous fusion variants, and conventional fusion approaches. The results significantly advance research on collaborative learning among heterogeneous self-supervised models.

Technology Category

Application Category

๐Ÿ“ Abstract
The emergence of Mamba as an alternative to attention-based architectures has led to the development of Mamba-based self-supervised learning (SSL) pre-trained models (PTMs) for speech and audio processing. Recent studies suggest that these models achieve comparable or superior performance to state-of-the-art (SOTA) attention-based PTMs for speech emotion recognition (SER). Motivated by prior work demonstrating the benefits of PTM fusion across different speech processing tasks, we hypothesize that leveraging the complementary strengths of Mamba-based and attention-based PTMs will enhance SER performance beyond the fusion of homogenous attention-based PTMs. To this end, we introduce a novel framework, PARROT that integrates parallel branch fusion with Optimal Transport and Hadamard Product. Our approach achieves SOTA results against individual PTMs, homogeneous PTMs fusion, and baseline fusion techniques, thus, highlighting the potential of heterogeneous PTM fusion for SER.
Problem

Research questions and friction points this paper is trying to address.

Integrating Mamba and attention-based PTMs for SER
Enhancing SER via heterogeneous PTM fusion
Optimizing fusion with Hadamard Product and Optimal Transport
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel branch fusion of Mamba and attention PTMs
Optimal Transport for heterogeneous PTM integration
Hadamard Product enhances complementary strengths
๐Ÿ”Ž Similar Papers
No similar papers found.