🤖 AI Summary
This work addresses the challenge of large language models generating semantically incorrect code in real-world data pipelines due to ambiguous instructions, task complexity, and insufficient structured feedback. To tackle this, the authors propose a multi-agent autonomous framework grounded in dynamic data profiling. The framework establishes a unified execution context integrating three core modules: a Profiler employing ReAct-style exploration, a Generator leveraging knowledge-enhanced operator retrieval, and an Evaluator-Summarizer providing execution assessment and diagnostic feedback. Through interactive exploration, knowledge-guided code synthesis, and closed-loop optimization, the system precisely aligns with user intent. Evaluated on a benchmark encompassing 18 tabular task types, the approach significantly outperforms strong baselines, demonstrating that dynamic data profiling plays a pivotal role in enhancing semantic correctness and compliance, particularly in complex multi-step scenarios.
📝 Abstract
Table processing-including cleaning, transformation, augmentation, and matching-is a foundational yet error-prone stage in real-world data pipelines. While recent LLM-based approaches show promise for automating such tasks, they often struggle in practice due to ambiguous instructions, complex task structures, and the lack of structured feedback, resulting in syntactically correct but semantically flawed code. To address these challenges, we propose ProfiliTable, an autonomous multi-agent framework centered on dynamic profiling, which constructs and iteratively refines a unified execution context through interactive exploration, knowledge-augmented synthesis, and feedback-driven refinement. ProfiliTable integrates (i) a Profiler that performs ReAct-style data exploration to build semantic understanding, (ii) a Generator that retrieves curated operators to synthesize task-aware code, and (iii) an Evaluator-Summarizer loop that injects execution scores and diagnostic insights to enable closed-loop refinement. Extensive experiments on a diverse benchmark covering 18 tabular task types demonstrate that ProfiliTable consistently outperforms strong baselines, particularly in complex multi-step scenarios. These results highlight the critical role of dynamic profiling in reliably translating ambiguous user intents into robust and governance-compliant table transformations.