🤖 AI Summary
This work proposes the first end-to-end real-time framework for surface vibration-based gesture recognition, addressing the limitation of existing approaches that focus only on isolated components. The system integrates piezoelectric sensing, configurable signal preprocessing, and lightweight model training into a unified pipeline. A modular data processing chain is established using bandpass filtering, fixed-length windowing, and min–max normalization, followed by a depthwise separable one-dimensional convolutional neural network containing only 8,722 parameters. Evaluated on a dataset of six gestures collected from 15 participants, the framework achieves consistently high accuracy across multiple partitioning strategies, with particularly strong performance in user-independent leave-one-subject-out cross-validation, thereby demonstrating the effectiveness of jointly optimizing preprocessing steps and model hyperparameters.
📝 Abstract
Sensing surface vibrations promise unobtrusive interaction for smart home systems by enabling gesture recognition on existing everyday surfaces without disturbing living-space design. Existing approaches typically address only parts of the processing chain, such as sensing hardware or offline gesture recognition, rather than providing an end-to-end system from surface-mounted sensors to the evaluation of the prediction model. This paper presents a custom sensor system and a configurable data-to-model pipeline for gesture recognition on a standard office desk. Our hardware enables a low-noise sensing of the vibrations using piezoelectric sensors. Building on a modular signal-processing framework, we model the full chain from continuous recordings through variable pre-processing to a model-ready dataset, and process the resulting data with compact depthwise separable 1D-CNNs. We conduct a joint search over pre-processing and model hyperparameters and identify a configuration with 8,722 parameters that uses band-pass filtering, fixed-length windows, and min-max normalization. On a self-recorded dataset with 15 participants performing six gestures this configuration achieves high accuracies across different data splitting methods, including strong user-independent performance in a leave-one-subject-out cross-validation.