π€ AI Summary
Existing feature transformation methods face scalability bottlenecks in large combinatorial spaces: discrete search strategies suffer from poor scalability, while continuous optimization is prone to local optima. To address this, we propose RewardDiffβa reward-guided hierarchical diffusion generative framework. First, a variational autoencoder (VAE) constructs a compact latent space. Second, a latent-space diffusion model performs global and robust feature representation search under guidance from a performance evaluator (i.e., reward signal). Third, a semi-autoregressive decoder jointly models intra-feature dependencies and enables inter-feature parallel generation. RewardDiff is the first method to reformulate feature transformation as a reward-driven diffusion generative process. Evaluated on 14 benchmark datasets, it significantly outperforms state-of-the-art approaches in prediction accuracy and robustness, while achieving substantial improvements in both training and inference efficiency.
π Abstract
Feature Transformation (FT) crafts new features from original ones via mathematical operations to enhance dataset expressiveness for downstream models. However, existing FT methods exhibit critical limitations: discrete search struggles with enormous combinatorial spaces, impeding practical use; and continuous search, being highly sensitive to initialization and step sizes, often becomes trapped in local optima, restricting global exploration. To overcome these limitations, DIFFT redefines FT as a reward-guided generative task. It first learns a compact and expressive latent space for feature sets using a Variational Auto-Encoder (VAE). A Latent Diffusion Model (LDM) then navigates this space to generate high-quality feature embeddings, its trajectory guided by a performance evaluator towards task-specific optima. This synthesis of global distribution learning (from LDM) and targeted optimization (reward guidance) produces potent embeddings, which a novel semi-autoregressive decoder efficiently converts into structured, discrete features, preserving intra-feature dependencies while allowing parallel inter-feature generation. Extensive experiments on 14 benchmark datasets show DIFFT consistently outperforms state-of-the-art baselines in predictive accuracy and robustness, with significantly lower training and inference times.