🤖 AI Summary
Existing feature transformation methods treat operations as isolated steps, neglecting dynamic dependencies among transformations—resulting in inefficient generation of high-value features and limited performance gains for downstream tasks. To address this, we propose a multi-agent collaborative reinforcement learning framework for automated feature engineering. Our approach models dynamic interactions between features and transformations via an evolutionary interaction graph; introduces a novel graph pruning and backtracking mechanism to eliminate redundant transformation paths while ensuring exploration stability; and enables reusable high-value subgraphs to support traceable, low-redundancy, and highly generalizable feature crossing discovery. Evaluated on multiple benchmark datasets, our method significantly improves downstream model performance and reduces invalid transformations by over 30%, demonstrating both effectiveness and practical engineering utility.
📝 Abstract
Feature transformation methods aim to find an optimal mathematical feature-feature crossing process that generates high-value features and improves the performance of downstream machine learning tasks. Existing frameworks, though designed to mitigate manual costs, often treat feature transformations as isolated operations, ignoring dynamic dependencies between transformation steps. To address the limitations, we propose TCTO, a collaborative multi-agent reinforcement learning framework that automates feature engineering through graph-driven path optimization. The framework's core innovation lies in an evolving interaction graph that models features as nodes and transformations as edges. Through graph pruning and backtracking, it dynamically eliminates low-impact edges, reduces redundant operations, and enhances exploration stability. This graph also provides full traceability to empower TCTO to reuse high-utility subgraphs from historical transformations. To demonstrate the efficacy and adaptability of our approach, we conduct comprehensive experiments and case studies, which show superior performance across a range of datasets.