Dynamic and Adaptive Feature Generation with LLM

📅 2024-06-04
🏛️ arXiv.org
📈 Citations: 21
Influential: 1
📄 PDF
🤖 AI Summary
Existing feature engineering approaches suffer from three fundamental limitations: poor interpretability, weak generalizability, and inflexible strategies—hindering practical deployment across diverse scenarios. To address these challenges, this paper proposes the first large language model (LLM)-driven dynamic adaptive feature generation paradigm. Our method integrates task-aware prompting with semantic modeling of the feature space, enabling real-time, interpretable, and controllable feature generation tailored to both data characteristics and task requirements. It ensures cross-modal and cross-task generality while maintaining full transparency in the feature generation process. Extensive experiments on multiple structured and unstructured data tasks demonstrate that features generated by our approach improve feature quality by 23.6% and boost downstream model performance by an average of 11.4%, significantly outperforming conventional automated feature engineering methods.

Technology Category

Application Category

📝 Abstract
The representation of feature space is a crucial environment where data points get vectorized and embedded for upcoming modeling. Thus the efficacy of machine learning (ML) algorithms is closely related to the quality of feature engineering. As one of the most important techniques, feature generation transforms raw data into an optimized feature space conducive to model training and further refines the space. Despite the advancements in automated feature engineering and feature generation, current methodologies often suffer from three fundamental issues: lack of explainability, limited applicability, and inflexible strategy. These shortcomings frequently hinder and limit the deployment of ML models across varied scenarios. Our research introduces a novel approach adopting large language models (LLMs) and feature-generating prompts to address these challenges. We propose a dynamic and adaptive feature generation method that enhances the interpretability of the feature generation process. Our approach broadens the applicability across various data types and tasks and draws advantages over strategic flexibility. A broad range of experiments showcases that our approach is significantly superior to existing methods.
Problem

Research questions and friction points this paper is trying to address.

Improves explainability of feature generation process
Enhances applicability across diverse data types
Increases strategic flexibility in feature engineering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLMs for dynamic feature generation
Enhances interpretability with adaptive prompts
Broadens applicability across diverse data types
🔎 Similar Papers
No similar papers found.
X
XinHao Zhang
Portland State University
J
Jinghan Zhang
Portland State University
Banafsheh Rekabdar
Banafsheh Rekabdar
Portland State University
Yuanchun Zhou
Yuanchun Zhou
Computer Network Information Center,CAS
Data MiningBig Data Analysis
P
Pengfei Wang
Computer Network Information Center, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences
Kunpeng Liu
Kunpeng Liu
Assistant Professor, Clemson University
Feature EngineeringLLM ReasoningReinforcement Learning