PPG-Based Affect Recognition with Long-Range Deep Models: A Measurement-Driven Comparison of CNN, Transformer, and Mamba Architectures

📅 2026-04-28

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This study addresses the uncertainty regarding whether long-range modeling architectures such as Transformer and Mamba outperform conventional CNN or CNN-LSTM models in wrist-based photoplethysmography (PPG)–based affective recognition, particularly under real-world conditions characterized by limited data and high noise. For the first time, Transformers and Mamba are introduced to PPG-based emotion recognition, with a unified preprocessing pipeline, segmentation strategy, and subject-independent 5-fold cross-validation protocol employed to systematically evaluate four model families on classifying arousal, valence, and relaxation states. Results demonstrate that CNNs achieve the highest accuracy and remain the most lightweight; Transformers exhibit more balanced F1 scores for arousal and relaxation; and Mamba performs comparably to Transformers but does not consistently surpass CNNs. These findings provide empirically grounded guidance for model selection in wearable affective computing systems.

📝 Abstract

Photoplethysmography (PPG) is increasingly used in wearable affective computing due to its low cost and ease of integration into consumer devices. Recent advances in deep learning have introduced long-range sequence models, such as Transformers, and state-space models, like Mamba, which have demonstrated strong performance on natural language and general time-series tasks. However, it remains unclear whether these architectures offer tangible benefits over widely used Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTMs) for PPG-based affect recognition, given that datasets are typically small and noisy. This work presents a measurement-driven comparison of four deep learning architectures, CNN, CNN-LSTM hybrid, Transformers, and Mamba, for classifying arousal, valence, and relaxation states from wrist-based PPG signals. All models are evaluated under a subject-independent 5-fold cross-validation protocol using identical preprocessing, segmentation, and training pipelines. Our results show that the Transformer and Mamba models achieve performance comparable to that of a CNN baseline, but do not consistently outperform it across all tasks. CNNs remain the most effective overall, providing the highest accuracy with the smallest model size, whereas Transformers have a better balance of F1 scores for Arousal and Relaxation. The study provides the first evaluation of Transformer and Mamba models for PPG-based affect recognition, offering practical guidance on model selection for wearable affective monitoring systems.

Problem

Research questions and friction points this paper is trying to address.

PPG

affect recognition

deep learning architectures

long-range models

wearable sensing

Innovation

Methods, ideas, or system contributions that make the work stand out.

PPG-based affect recognition

Transformer

Mamba