LLM-Driven Reasoning for Constraint-Aware Feature Selection in Industrial Systems

📅 2026-03-25

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses the challenge of feature selection in industrial settings where labeled data are scarce and multiple business constraints must be satisfied, a scenario in which conventional methods struggle to balance predictive accuracy with regulatory compliance. To this end, we propose the Model Feature Agent (MoFA) framework, which pioneers the integration of large language models’ reasoning capabilities into industrial-scale feature selection. MoFA employs structured prompts to jointly incorporate feature semantics, quantitative metrics, and domain-specific constraints, enabling interpretable and sequential automated feature engineering. Evaluations across three real-world industrial applications demonstrate that MoFA not only enhances model accuracy but also effectively uncovers high-order interaction features, yielding substantial online performance gains while producing compact, efficient, and compliant feature subsets.

Technology Category

Application Category

📝 Abstract

Feature selection is a crucial step in large-scale industrial machine learning systems, directly affecting model accuracy, efficiency, and maintainability. Traditional feature selection methods rely on labeled data and statistical heuristics, making them difficult to apply in production environments where labeled data are limited and multiple operational constraints must be satisfied. To address this, we propose Model Feature Agent (MoFA), a model-driven framework that performs sequential, reasoning-based feature selection using both semantic and quantitative feature information. MoFA incorporates feature definitions, importance scores, correlations, and metadata (e.g., feature groups or types) into structured prompts and selects features through interpretable, constraint-aware reasoning. We evaluate MoFA in three real-world industrial applications: (1) True Interest and Time-Worthiness Prediction, where it improves accuracy while reducing feature group complexity, (2) Value Model Enhancement, where it discovers high-order interaction terms that yield substantial engagement gains in online experiments, and (3) Notification Behavior Prediction, where it selects compact, high-value feature subsets that improve both model accuracy and inference efficiency. Together, these results demonstrate the practicality and effectiveness of LLM-based reasoning for feature selection in real production systems.

Problem

Research questions and friction points this paper is trying to address.

feature selection

industrial machine learning

labeled data scarcity

operational constraints

constraint-aware

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven reasoning

constraint-aware feature selection

Model Feature Agent