Poster: Recognizing Hidden-in-the-Ear Private Key for Reliable Silent Speech Interface Using Multi-Task Learning

📅 2025-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of jointly achieving silent speech recognition and speaker authentication in silent speech interfaces (SSIs), this paper proposes HEar-ID: an end-to-end joint modeling framework leveraging off-the-shelf active noise-cancelling earbuds. HEar-ID simultaneously captures low-frequency “whisper” audio from the ear canal and high-frequency ultrasonic echo signals, employing a lightweight shared encoder for multi-task learning, augmented by contrastive learning and cross-modal feature alignment. To our knowledge, it is the first approach to concurrently achieve 50-word-level silent spelling recognition and biometric-key-level speaker authentication on a single device with a single model—requiring no additional hardware or explicit user cooperation. Experiments demonstrate that HEar-ID maintains high spelling accuracy while significantly improving impostor rejection performance. This work establishes a new paradigm for seamless, privacy-preserving authentication in sensitive applications.

Technology Category

Application Category

📝 Abstract
Silent speech interface (SSI) enables hands-free input without audible vocalization, but most SSI systems do not verify speaker identity. We present HEar-ID, which uses consumer active noise-canceling earbuds to capture low-frequency "whisper" audio and high-frequency ultrasonic reflections. Features from both streams pass through a shared encoder, producing embeddings that feed a contrastive branch for user authentication and an SSI head for silent spelling recognition. This design supports decoding of 50 words while reliably rejecting impostors, all on commodity earbuds with a single model. Experiments demonstrate that HEar-ID achieves strong spelling accuracy and robust authentication.
Problem

Research questions and friction points this paper is trying to address.

Authenticates users via ear biometrics during silent speech
Decodes silent speech commands without audible vocalization
Integrates authentication and recognition in a single earbud model
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses earbuds to capture whisper audio and ultrasonic reflections
Employs shared encoder with multi-task learning for authentication and recognition
Operates on commodity earbuds with a single model for dual functions
🔎 Similar Papers
No similar papers found.
X
Xuefu Dong
The University of Tokyo, Tokyo, Japan
L
Liqiang Xu
The University of Tokyo, Tokyo, Japan
L
Lixing He
The Chinese University of Hong Kong, Hong Kong SAR, China
Z
Zengyi Han
Dalian Maritime University, Dalian, China
K
Ken Christofferson
University of Toronto, Toronto, Ontario, Canada
Y
Yifei Chen
Tsinghua University, Beijing, China
A
Akihito Taya
The University of Tokyo, Tokyo, Japan
Y
Yuuki Nishiyama
The University of Tokyo, Chiba, Japan
Kaoru Sezaki
Kaoru Sezaki
Professor, Center for Spatial Information Science, University of Tokyo
Sensor networkseHealthSpatial information sciencecommunication networks