🤖 AI Summary
Existing feature engineering treats feature selection and generation as disjoint processes, leading to high redundancy, information loss, and suboptimal model performance. To address this, we propose a multi-agent collaborative feature enhancement paradigm that jointly models selection and generation as a unified decision-making problem. The framework comprises three specialized agents: Selector, Generator, and Router. The Router integrates long- and short-term memory—via in-context learning and a vector database—to enable global-local joint optimization. Moreover, we introduce the first application of offline Proximal Policy Optimization (PPO) for reinforcement-learning-based fine-tuning directly in feature space. Evaluated on multiple benchmark datasets, our method achieves an average AUC improvement of 3.2%, reduces feature dimensionality by over 40%, and preserves—or even enhances—discriminative capability, thereby significantly boosting downstream model performance.
📝 Abstract
As a widely-used and practical tool, feature engineering transforms raw data into discriminative features to advance AI model performance. However, existing methods usually apply feature selection and generation separately, failing to strive a balance between reducing redundancy and adding meaningful dimensions. To fill this gap, we propose an agentic feature augmentation concept, where the unification of feature generation and selection is modeled as agentic teaming and planning. Specifically, we develop a Multi-Agent System with Long and Short-Term Memory (MAGS), comprising a selector agent to eliminate redundant features, a generator agent to produce informative new dimensions, and a router agent that strategically coordinates their actions. We leverage in-context learning with short-term memory for immediate feedback refinement and long-term memory for globally optimal guidance. Additionally, we employ offline Proximal Policy Optimization (PPO) reinforcement fine-tuning to train the router agent for effective decision-making to navigate a vast discrete feature space. Extensive experiments demonstrate that this unified agentic framework consistently achieves superior task performance by intelligently orchestrating feature selection and generation.