Poster: Recognizing Hidden-in-the-Ear Private Key for Reliable Silent Speech Interface Using Multi-Task Learning

📅 2025-12-18

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

To address the challenge of jointly achieving silent speech recognition and speaker authentication in silent speech interfaces (SSIs), this paper proposes HEar-ID: an end-to-end joint modeling framework leveraging off-the-shelf active noise-cancelling earbuds. HEar-ID simultaneously captures low-frequency “whisper” audio from the ear canal and high-frequency ultrasonic echo signals, employing a lightweight shared encoder for multi-task learning, augmented by contrastive learning and cross-modal feature alignment. To our knowledge, it is the first approach to concurrently achieve 50-word-level silent spelling recognition and biometric-key-level speaker authentication on a single device with a single model—requiring no additional hardware or explicit user cooperation. Experiments demonstrate that HEar-ID maintains high spelling accuracy while significantly improving impostor rejection performance. This work establishes a new paradigm for seamless, privacy-preserving authentication in sensitive applications.

Technology Category

Application Category

📝 Abstract

Silent speech interface (SSI) enables hands-free input without audible vocalization, but most SSI systems do not verify speaker identity. We present HEar-ID, which uses consumer active noise-canceling earbuds to capture low-frequency "whisper" audio and high-frequency ultrasonic reflections. Features from both streams pass through a shared encoder, producing embeddings that feed a contrastive branch for user authentication and an SSI head for silent spelling recognition. This design supports decoding of 50 words while reliably rejecting impostors, all on commodity earbuds with a single model. Experiments demonstrate that HEar-ID achieves strong spelling accuracy and robust authentication.

Problem

Research questions and friction points this paper is trying to address.

Authenticates users via ear biometrics during silent speech

Decodes silent speech commands without audible vocalization

Integrates authentication and recognition in a single earbud model

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses earbuds to capture whisper audio and ultrasonic reflections

Employs shared encoder with multi-task learning for authentication and recognition

Operates on commodity earbuds with a single model for dual functions

🔎 Similar Papers

No similar papers found.