DESAMO: A Device for Elder-Friendly Smart Homes Powered by Embedded LLM with Audio Modality

๐Ÿ“… 2025-08-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the challenges of poor speech recognition for elderly usersโ€™ disfluent speech and the inability of conventional ASR-LLM cascaded systems to detect non-speech events (e.g., falls, cries for help), this paper proposes DESAMOโ€”the first edge-deployed, embedded Audio Large Language Model (Audio LLM) system tailored for elderly-friendly smart homes. DESAMO eliminates reliance on ASR by performing multi-granularity audio understanding directly on-device from raw waveforms, jointly modeling natural speech and critical non-speech events while ensuring real-time responsiveness, robustness, and on-device privacy preservation. Experiments demonstrate significant improvements in both elderly speech recognition accuracy and emergency event detection reliability, with average inference latency under 200 ms and end-to-end local data processing. Its core contributions are: (1) the first efficient deployment of an Audio LLM on resource-constrained embedded hardware, and (2) a novel end-to-end audio semantic understanding paradigm specifically designed for elderly users.

Technology Category

Application Category

๐Ÿ“ Abstract
We present DESAMO, an on-device smart home system for elder-friendly use powered by Audio LLM, that supports natural and private interactions. While conventional voice assistants rely on ASR-based pipelines or ASR-LLM cascades, often struggling with the unclear speech common among elderly users and unable to handle non-speech audio, DESAMO leverages an Audio LLM to process raw audio input directly, enabling a robust understanding of user intent and critical events, such as falls or calls for help.
Problem

Research questions and friction points this paper is trying to address.

Processes raw audio input directly for elderly users
Handles unclear speech and non-speech audio events
Enables private natural interactions in smart homes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Audio LLM processes raw audio directly
On-device system ensures private elder interactions
Detects critical events like falls and calls
๐Ÿ”Ž Similar Papers
Youngwon Choi
Youngwon Choi
MAUM AI Inc.
Conversational AI
D
Donghyuk Jung
Korea Culture Technology Institute, Gwangju, Republic of Korea
H
Hwayeon Kim
MAUM AI Inc., Seongnam, Republic of Korea