🤖 AI Summary
This work addresses the data-centric paradigm for tabular learning by systematically investigating the synergistic optimization of feature engineering via reinforcement learning (RL) and generative AI. We propose the first unified analytical framework integrating RL algorithms—including PPO and DQN—with generative feature synthesis techniques such as TabGAN, TABI, VAEs, and LLM-driven methods, enabling automated feature selection and generation. Through a comprehensive review of over 100 studies, we delineate application boundaries across domains including healthcare and finance; we further design a scalable evaluation protocol and identify five key open challenges. Our core contribution is the first principled articulation—both theoretically and empirically—of cross-paradigm synergy between RL and generative approaches for tabular data, yielding significant improvements in data quality and downstream model performance.
📝 Abstract
Tabular data is one of the most widely used data formats across various domains such as bioinformatics, healthcare, and marketing. As artificial intelligence moves towards a data-centric perspective, improving data quality is essential for enhancing model performance in tabular data-driven applications. This survey focuses on data-driven tabular data optimization, specifically exploring reinforcement learning (RL) and generative approaches for feature selection and feature generation as fundamental techniques for refining data spaces. Feature selection aims to identify and retain the most informative attributes, while feature generation constructs new features to better capture complex data patterns. We systematically review existing generative methods for tabular data engineering, analyzing their latest advancements, real-world applications, and respective strengths and limitations. This survey emphasizes how RL-based and generative techniques contribute to the automation and intelligence of feature engineering. Finally, we summarize the existing challenges and discuss future research directions, aiming to provide insights that drive continued innovation in this field.