Feature Selection via Graph Topology Inference for Soundscape Emotion Recognition

📅 2025-09-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the opacity and unreliability in modeling relationships between acoustic features and affective dimensions (arousal and valence) in soundscape emotion recognition (SER). We propose a graph topology inference framework based on linear structural equation modeling (SEM), which integrates information criteria with a generalized elbow detector to automatically learn sparse directed graphs that reveal causal contributions of features to emotional outputs, while quantifying uncertainty in sparsity selection. Experiments on the Emo-Soundscapes dataset demonstrate that our method significantly improves feature selection accuracy and enables interpretable visualization of feature–emotion relationships. Crucially, it provides the first quantitative evidence of a strong statistical association between arousal and valence—challenging the conventional orthogonality assumption—and establishes a novel, interpretable paradigm for SER modeling.

Technology Category

Application Category

📝 Abstract
Research on soundscapes has shifted the focus of environmental acoustics from noise levels to the perception of sounds, incorporating contextual factors. Soundscape emotion recognition (SER) models perception using a set of features, with arousal and valence commonly regarded as sufficient descriptors of affect. In this work, we blend emph{graph learning} techniques with a novel emph{information criterion} to develop a feature selection framework for SER. Specifically, we estimate a sparse graph representation of feature relations using linear structural equation models (SEM) tailored to the widely used Emo-Soundscapes dataset. The resulting graph captures the relations between input features and the two emotional outputs. To determine the appropriate level of sparsity, we propose a novel emph{generalized elbow detector}, which provides both a point estimate and an uncertainty interval. We conduct an extensive evaluation of our methods, including visualizations of the inferred relations. While several of our findings align with previous studies, the graph representation also reveals a strong connection between arousal and valence, challenging common SER assumptions.
Problem

Research questions and friction points this paper is trying to address.

Developing feature selection framework for soundscape emotion recognition
Estimating sparse graph representation of feature relations using SEM
Challenging common SER assumptions about arousal-valence relationship
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph learning with information criterion for feature selection
Sparse graph representation using structural equation models
Generalized elbow detector for sparsity level determination
Samuel Rey
Samuel Rey
Universidad Rey Juan Carlos
Graph Signal ProcessingSignal ProcessingGNNDeep Learning
Luca Martino
Luca Martino
Associate Professor - University of Catania
Bayesian inferencecomputational methods (MCMCparticle filtersexact sampling etc.. )
R
Roberto San Millán-Castillo
Department of Signal Theory and Communications, Universidad Rey Juan Carlos, Madrid, Spain
E
Eduardo Morgado
Department of Signal Theory and Communications, Universidad Rey Juan Carlos, Madrid, Spain