🤖 AI Summary
Existing bio-signal feature extraction methods lack task-context awareness, struggle to adaptively select optimal features in high-dimensional spaces, and frequently induce code-generation errors. To address these limitations, we propose the first LLM-driven iterative feature generation framework that integrates domain expert knowledge with task-specific context: it employs large language models for context-aware multi-source feature synthesis, augmented by an evaluation-feedback-driven feature reselection mechanism and a multi-layer filtering and code-verification strategy—significantly enhancing feature quality and generation reliability. Evaluated across eight wearable bio-signal analysis tasks, our method achieves average AUROC improvements of 4.21%–9.67%; it surpasses state-of-the-art (SOTA) methods on five tasks and matches SOTA on the remaining three. This work pioneers the deep integration of LLMs into a closed-loop feature engineering pipeline, establishing a novel paradigm for task-adaptive, robust, and interpretable bio-signal modeling.
📝 Abstract
Biosignals collected from wearable devices are widely utilized in healthcare applications. Machine learning models used in these applications often rely on features extracted from biosignals due to their effectiveness, lower data dimensionality, and wide compatibility across various model architectures. However, existing feature extraction methods often lack task-specific contextual knowledge, struggle to identify optimal feature extraction settings in high-dimensional feature space, and are prone to code generation and automation errors. In this paper, we propose DeepFeature, the first LLM-empowered, context-aware feature generation framework for wearable biosignals. DeepFeature introduces a multi-source feature generation mechanism that integrates expert knowledge with task settings. It also employs an iterative feature refinement process that uses feature assessment-based feedback for feature re-selection. Additionally, DeepFeature utilizes a robust multi-layer filtering and verification approach for robust feature-to-code translation to ensure that the extraction functions run without crashing. Experimental evaluation results show that DeepFeature achieves an average AUROC improvement of 4.21-9.67% across eight diverse tasks compared to baseline methods. It outperforms state-of-the-art approaches on five tasks while maintaining comparable performance on the remaining tasks.