SMSAT: A Multimodal Acoustic Dataset and Deep Contrastive Learning Framework for Affective and Physiological Modeling of Spiritual Meditation

📅 2025-05-01

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This study investigates the differential affective and physiological impacts of three auditory stimuli—spiritual meditation, music, and natural silence—to advance affective computing and mental health technologies. Method: We introduce SMSAT, the first multimodal audio-physiological-affective dataset specifically designed for spiritual meditation research, and propose a deep contrastive learning–based audio encoder alongside the Calmness Analysis Model (CAM), which integrates 25 handcrafted and learned features. Temporal modeling, ANOVA, and paired t-tests are jointly employed for high-fidelity cross-condition affective state modeling. Contribution/Results: Our approach achieves 99.99% audio-condition classification accuracy—nearly 10 percentage points above current state-of-the-art. Crucially, we report the first empirical evidence that spiritual meditation elicits significantly greater heart rate variability (HRV) responses compared to music or natural silence, substantiating its distinct autonomic nervous system regulatory effect.

Technology Category

Application Category

📝 Abstract

Understanding how auditory stimuli influence emotional and physiological states is fundamental to advancing affective computing and mental health technologies. In this paper, we present a multimodal evaluation of the affective and physiological impacts of three auditory conditions, that is, spiritual meditation (SM), music (M), and natural silence (NS), using a comprehensive suite of biometric signal measures. To facilitate this analysis, we introduce the Spiritual, Music, Silence Acoustic Time Series (SMSAT) dataset, a novel benchmark comprising acoustic time series (ATS) signals recorded under controlled exposure protocols, with careful attention to demographic diversity and experimental consistency. To model the auditory induced states, we develop a contrastive learning based SMSAT audio encoder that extracts highly discriminative embeddings from ATS data, achieving 99.99% classification accuracy in interclass and intraclass evaluations. Furthermore, we propose the Calmness Analysis Model (CAM), a deep learning framework integrating 25 handcrafted and learned features for affective state classification across auditory conditions, attaining robust 99.99% classification accuracy. In contrast, pairwise t tests reveal significant deviations in cardiac response characteristics (CRC) between SM analysis via ANOVA inducing more significant physiological fluctuations. Compared to existing state of the art methods reporting accuracies up to 90%, the proposed model demonstrates substantial performance gains (up to 99%). This work contributes a validated multimodal dataset and a scalable deep learning framework for affective computing applications in stress monitoring, mental well-being, and therapeutic audio-based interventions.

Problem

Research questions and friction points this paper is trying to address.

Analyzing how auditory stimuli affect emotional and physiological states

Developing a dataset (SMSAT) for modeling meditation, music, and silence impacts

Creating a deep learning framework (CAM) for high-accuracy affective state classification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal dataset SMSAT for affective and physiological modeling

Contrastive learning based SMSAT audio encoder

Deep learning framework CAM with 25 features

🔎 Similar Papers

Multimodal Machine Learning in Mental Health: A Survey of Data, Algorithms, and Challenges