Enabling Automatic Self-Talk Detection via Earables

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenging problem of automatic detection of vocalized self-talk in real-world settings—characterized by high acoustic variability, semantic and syntactic incompleteness, sparse and irregular occurrence—conditions that fundamentally violate assumptions underlying conventional speech models. To this end, we propose MutterMeter, a system that captures audio via ear-worn microphones and employs a hierarchical classification architecture integrating acoustic features, local linguistic cues, and contextual sequence information for progressive recognition. Its lightweight pipeline is optimized for on-device execution, balancing accuracy and efficiency at the edge. Evaluated on 31.1 hours of naturalistic audio from 25 participants, MutterMeter achieves a macro-F1 score of 0.84—significantly outperforming baselines based on speech emotion recognition and large language models. To our knowledge, this is the first work to enable high-accuracy, low-latency detection of everyday self-talk, establishing a novel paradigm for cognitive and affective computing.

Technology Category

Application Category

📝 Abstract
Self-talk-an internal dialogue that can occur silently or be spoken aloud-plays a crucial role in emotional regulation, cognitive processing, and motivation, yet has remained largely invisible and unmeasurable in everyday life. In this paper, we present MutterMeter, a mobile system that automatically detects vocalized self-talk from audio captured by earable microphones in real-world settings. Detecting self-talk is technically challenging due to its diverse acoustic forms, semantic and grammatical incompleteness, and irregular occurrence patterns, which differ fundamentally from assumptions underlying conventional speech understanding models. To address these challenges, MutterMeter employs a hierarchical classification architecture that progressively integrates acoustic, linguistic, and contextual information through a sequential processing pipeline, adaptively balancing accuracy and computational efficiency. We build and evaluate MutterMeter using a first-of-its-kind dataset comprising 31.1 hours of audio collected from 25 participants. Experimental results demonstrate that MutterMeter achieves robust performance with a macro-averaged F1 score of 0.84, outperforming conventional approaches, including LLM-based and speech emotion recognition models.
Problem

Research questions and friction points this paper is trying to address.

Detecting vocalized self-talk in real-world settings
Addressing acoustic diversity and semantic incompleteness of self-talk
Overcoming irregular occurrence patterns for speech recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical classification integrates acoustic linguistic contextual data
Earable microphones capture real-world audio for self-talk detection
Sequential processing balances accuracy with computational efficiency
🔎 Similar Papers
No similar papers found.
E
Euihyeok Lee
Korea University of Technology and Education, Republic of Korea
Seonghyeon Kim
Seonghyeon Kim
Ph.D. Student at KAIST, Visual Media Lab
Computer Graphics
S
Sanghun Im
Korea University of Technology and Education, Republic of Korea
Heung-Seon Oh
Heung-Seon Oh
Korea University of Technology and Education, Republic of Korea
Seungwoo Kang
Seungwoo Kang
Korea University of Technology and Education, Republic of Korea