Spiking Neural Networks with Temporal Attention-Guided Adaptive Fusion for imbalanced Multi-modal Learning

📅 2025-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address modality imbalance and temporal misalignment in multimodal Spiking Neural Networks (SNNs), this paper proposes the Temporal Attention-guided Adaptive Fusion (TAAF) framework. TAAF innovatively integrates temporal attention, learnable time-warping, and a modality-aware temporal balancing fusion loss to enable timestep-wise dynamic weight allocation and synchronized convergence rate adjustment across modalities—mimicking biological cortical multisensory integration. The entire architecture operates under event-driven computation, ensuring compatibility with neuromorphic hardware. Evaluated on CREMA-D, AVE, and EAD benchmarks, TAAF achieves state-of-the-art accuracies of 77.55%, 70.65%, and 97.5%, respectively—outperforming existing SNN baselines. Moreover, it accelerates convergence by 23% and reduces inference energy consumption by 31%. This work establishes a novel paradigm for energy-efficient, biologically interpretable multimodal brain-inspired computing.

Technology Category

Application Category

📝 Abstract
Multimodal spiking neural networks (SNNs) hold significant potential for energy-efficient sensory processing but face critical challenges in modality imbalance and temporal misalignment. Current approaches suffer from uncoordinated convergence speeds across modalities and static fusion mechanisms that ignore time-varying cross-modal interactions. We propose the temporal attention-guided adaptive fusion framework for multimodal SNNs with two synergistic innovations: 1) The Temporal Attention-guided Adaptive Fusion (TAAF) module that dynamically assigns importance scores to fused spiking features at each timestep, enabling hierarchical integration of temporally heterogeneous spike-based features; 2) The temporal adaptive balanced fusion loss that modulates learning rates per modality based on the above attention scores, preventing dominant modalities from monopolizing optimization. The proposed framework implements adaptive fusion, especially in the temporal dimension, and alleviates the modality imbalance during multimodal learning, mimicking cortical multisensory integration principles. Evaluations on CREMA-D, AVE, and EAD datasets demonstrate state-of-the-art performance (77.55%, 70.65% and 97.5%accuracy, respectively) with energy efficiency. The system resolves temporal misalignment through learnable time-warping operations and faster modality convergence coordination than baseline SNNs. This work establishes a new paradigm for temporally coherent multimodal learning in neuromorphic systems, bridging the gap between biological sensory processing and efficient machine intelligence.
Problem

Research questions and friction points this paper is trying to address.

Address modality imbalance in multimodal SNNs
Resolve temporal misalignment in spike-based features
Improve energy-efficient sensory processing performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic importance scores for fused spiking features
Modulates learning rates per modality adaptively
Learnable time-warping resolves temporal misalignment
🔎 Similar Papers
No similar papers found.
J
Jiangrong Shen
Faculty of Electronic and Information Engineering, Xi’an Jiaotong University; Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University; National Key Lab of Human-Machine Hybrid Augmented Intelligence, Xi’an Jiaotong University; State Key Lab of Brain-Machine Intelligence, Zhejiang University
Y
Yulin Xie
Faculty of Electronic and Information Engineering, Xi’an Jiaotong University
Q
Qi Xu
School of Computer Science, Dalian University of Technology
Gang Pan
Gang Pan
Tianjin University
Computer visionMultimodalAI
Huajin Tang
Huajin Tang
Zhejiang University, China
Brain-inspired AIneuroroboticsspiking neural networksbrain-inspired computing
Badong Chen
Badong Chen
Professor of Xi'an Jiaotong University, Xi'an, China
signal processingmachine learningbrain machine interfacesrobotics