Tabular Feature Discovery With Reasoning Type Exploration

📅 2025-06-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited diversity, impoverished semantics, and severe transformation bias of LLM-generated features in tabular data feature engineering, this paper proposes a multi-reasoning-mode-guided feature discovery framework. The method explicitly models four structured reasoning pathways—inductive, analogical, causal, and counterfactual—and integrates an adaptive reasoning-type selection mechanism with stepwise prompt engineering to dynamically constrain LLM behavior during feature generation. Evaluated on 59 benchmark datasets, the framework significantly improves downstream model average prediction accuracy (+1.8%), while yielding features with higher information content, richer semantics, and lower redundancy. It effectively mitigates inherent LLM biases in feature engineering, establishing a novel paradigm for interpretable and controllable automated feature construction.

Technology Category

Application Category

📝 Abstract
Feature engineering for tabular data remains a critical yet challenging step in machine learning. Recently, large language models (LLMs) have been used to automatically generate new features by leveraging their vast knowledge. However, existing LLM-based approaches often produce overly simple or repetitive features, partly due to inherent biases in the transformations the LLM chooses and the lack of structured reasoning guidance during generation. In this paper, we propose a novel method REFeat, which guides an LLM to discover diverse and informative features by leveraging multiple types of reasoning to steer the feature generation process. Experiments on 59 benchmark datasets demonstrate that our approach not only achieves higher predictive accuracy on average, but also discovers more diverse and meaningful features. These results highlight the promise of incorporating rich reasoning paradigms and adaptive strategy selection into LLM-driven feature discovery for tabular data.
Problem

Research questions and friction points this paper is trying to address.

Automating feature engineering for tabular data using LLMs
Overcoming simple/repetitive feature generation in LLM-based approaches
Enhancing feature diversity via structured reasoning guidance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Guides LLM with multiple reasoning types
Enhances feature diversity and informativeness
Improves predictive accuracy significantly
🔎 Similar Papers
No similar papers found.