Enabling Automatic Self-Talk Detection via Earables

📅 2025-11-10

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This study addresses the challenging problem of automatic detection of vocalized self-talk in real-world settings—characterized by high acoustic variability, semantic and syntactic incompleteness, sparse and irregular occurrence—conditions that fundamentally violate assumptions underlying conventional speech models. To this end, we propose MutterMeter, a system that captures audio via ear-worn microphones and employs a hierarchical classification architecture integrating acoustic features, local linguistic cues, and contextual sequence information for progressive recognition. Its lightweight pipeline is optimized for on-device execution, balancing accuracy and efficiency at the edge. Evaluated on 31.1 hours of naturalistic audio from 25 participants, MutterMeter achieves a macro-F1 score of 0.84—significantly outperforming baselines based on speech emotion recognition and large language models. To our knowledge, this is the first work to enable high-accuracy, low-latency detection of everyday self-talk, establishing a novel paradigm for cognitive and affective computing.

Technology Category

Application Category

📝 Abstract

Self-talk-an internal dialogue that can occur silently or be spoken aloud-plays a crucial role in emotional regulation, cognitive processing, and motivation, yet has remained largely invisible and unmeasurable in everyday life. In this paper, we present MutterMeter, a mobile system that automatically detects vocalized self-talk from audio captured by earable microphones in real-world settings. Detecting self-talk is technically challenging due to its diverse acoustic forms, semantic and grammatical incompleteness, and irregular occurrence patterns, which differ fundamentally from assumptions underlying conventional speech understanding models. To address these challenges, MutterMeter employs a hierarchical classification architecture that progressively integrates acoustic, linguistic, and contextual information through a sequential processing pipeline, adaptively balancing accuracy and computational efficiency. We build and evaluate MutterMeter using a first-of-its-kind dataset comprising 31.1 hours of audio collected from 25 participants. Experimental results demonstrate that MutterMeter achieves robust performance with a macro-averaged F1 score of 0.84, outperforming conventional approaches, including LLM-based and speech emotion recognition models.

Problem

Research questions and friction points this paper is trying to address.

Detecting vocalized self-talk in real-world settings

Addressing acoustic diversity and semantic incompleteness of self-talk

Overcoming irregular occurrence patterns for speech recognition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical classification integrates acoustic linguistic contextual data

Earable microphones capture real-world audio for self-talk detection

Sequential processing balances accuracy with computational efficiency

🔎 Similar Papers

Egocentric Speaker Classification in Child-Adult Dyadic Interactions: From Sensing to Computational Modeling

2024-09-14arXiv.orgCitations: 1

Information Fusion in Multimodal IoT Systems for physical activity level monitoring

2024-03-17Citations: 0