CogniQ-H: A Soft Hierarchical Reinforcement Learning Paradigm for Automated Data Preparation

📅 2025-07-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Data preparation faces challenges including an exponentially large search space, difficulty in modeling structural hierarchical properties, and low sample efficiency of conventional reinforcement learning. This paper proposes the first soft-hierarchical reinforcement learning framework tailored for data preparation. It probabilistically couples high-level strategy priors—generated by large language models—with low-level operation quality assessments—provided by supervised ranking models—via Bayesian inference, thereby avoiding the irreversibility inherent in hard-hierarchical decision-making. Additionally, it integrates long-horizon value estimation from agent Q-functions to enable end-to-end collaborative decision-making. Extensive experiments across 18 cross-domain datasets demonstrate that our method improves pipeline quality by up to 13.9% over the strongest baseline while accelerating convergence by 2.8×.

Technology Category

Application Category

📝 Abstract
Data preparation is a foundational yet notoriously challenging component of the machine learning lifecycle, characterized by a vast combinatorial search space of potential operator sequences. While reinforcement learning (RL) offers a promising direction, existing approaches are inefficient as they fail to capture the structured, hierarchical nature of the problem. We argue that Hierarchical Reinforcement Learning (HRL), a paradigm that has been successful in other domains, provides a conceptually ideal yet previously unexplored framework for this task. However, a naive HRL implementation with a `hard hierarchy' is prone to suboptimal, irreversible decisions. To address this, we introduce CogniQ-H, the first framework to implement a soft hierarchical paradigm for robust, end-to-end automated data preparation. CogniQ-H formulates action selection as a Bayesian inference problem. A high-level strategic prior, generated by a Large Language Model (LLM), guides exploration probabilistically. This prior is synergistically combined with a fine-grained operator quality score from a supervised Learning-to-Rank (LTR) model and a long-term value estimate from the agent's own Q-function. This hybrid architecture allows CogniQ-H to balance strategic guidance with adaptive, evidence-based decision-making. Through extensive experiments on 18 diverse datasets spanning multiple domains, we demonstrate that CogniQ-H achieves up to 13.9% improvement in pipeline quality and 2.8$ imes$ faster convergence compared to state-of-the-art RL-based methods.
Problem

Research questions and friction points this paper is trying to address.

Automating complex data preparation in machine learning
Addressing inefficiency in hierarchical reinforcement learning approaches
Balancing strategic guidance with adaptive decision-making
Innovation

Methods, ideas, or system contributions that make the work stand out.

Soft hierarchical reinforcement learning for data preparation
Bayesian inference for action selection with LLM guidance
Hybrid architecture combining LTR and Q-function estimates
🔎 Similar Papers
No similar papers found.