🤖 AI Summary
To address the high barrier to entry and heavy reliance on engineering expertise in data pipeline development, this paper proposes a hybrid generative approach that automatically compiles natural language specifications into reliable, executable Apache Airflow DAGs. The method synergistically integrates large language models (LLMs) for semantic understanding, structured template engines for deterministic syntactic constraints, and a multi-stage validation mechanism—balancing expressiveness with correctness guarantees. We introduce a novel three-dimensional evaluation framework—SAT (Semantic Accuracy), DST (Structural Integrity), and PCT (Programmatic Executability)—to systematically quantify generation quality. Experimental results demonstrate a 78.5% successful generation rate, substantially outperforming pure-LLM (66.2%) and end-to-end generative baselines (29.2%), while achieving over twofold improvement in cost efficiency per generated DAG. This work provides a practical, production-ready pathway toward democratizing low-code data pipeline development.
📝 Abstract
Developing reliable data enrichment pipelines demands significant engineering expertise. We present Prompt2DAG, a methodology that transforms natural language descriptions into executable Apache Airflow DAGs. We evaluate four generation approaches -- Direct, LLM-only, Hybrid, and Template-based -- across 260 experiments using thirteen LLMs and five case studies to identify optimal strategies for production-grade automation. Performance is measured using a penalized scoring framework that combines reliability with code quality (SAT), structural integrity (DST), and executability (PCT). The Hybrid approach emerges as the optimal generative method, achieving a 78.5% success rate with robust quality scores (SAT: 6.79, DST: 7.67, PCT: 7.76). This significantly outperforms the LLM-only (66.2% success) and Direct (29.2% success) methods. Our findings show that reliability, not intrinsic code quality, is the primary differentiator. Cost-effectiveness analysis reveals the Hybrid method is over twice as efficient as Direct prompting per successful DAG. We conclude that a structured, hybrid approach is essential for balancing flexibility and reliability in automated workflow generation, offering a viable path to democratize data pipeline development.