🤖 AI Summary
To address arithmetic, commonsense, and hypothetical reasoning challenges in complex bilingual NL2SQL tasks—as well as cross-lingual transliteration and entity mismatch issues—this paper proposes an agent-based framework centered on stepwise natural language planning. The method introduces a collaborative Planner Agent and SQL Agent architecture, integrating entity linking guidance, multi-candidate plan generation, LLM-distilled error-pattern modeling, and feedback-driven meta-prompt optimization. Correction principles are derived from human-annotated failure-case clustering, while plan diversification and majority voting enhance robustness. Evaluated on English and Chinese benchmarks, the framework achieves execution accuracies of 55.0% and 56.7%, respectively—outperforming the second-best system by over six percentage points—while maintaining a stable SQL validity rate of ≥99%.
📝 Abstract
We present OraPlan-SQL, our system for the Archer NL2SQL Evaluation Challenge 2025, a bilingual benchmark requiring complex reasoning such as arithmetic, commonsense, and hypothetical inference. OraPlan-SQL ranked first, exceeding the second-best system by more than 6% in execution accuracy (EX), with 55.0% in English and 56.7% in Chinese, while maintaining over 99% SQL validity (VA). Our system follows an agentic framework with two components: Planner agent that generates stepwise natural language plans, and SQL agent that converts these plans into executable SQL. Since SQL agent reliably adheres to the plan, our refinements focus on the planner. Unlike prior methods that rely on multiple sub-agents for planning and suffer from orchestration overhead, we introduce a feedback-guided meta-prompting strategy to refine a single planner. Failure cases from a held-out set are clustered with human input, and an LLM distills them into corrective guidelines that are integrated into the planner's system prompt, improving generalization without added complexity. For the multilingual scenario, to address transliteration and entity mismatch issues, we incorporate entity-linking guidelines that generate alternative surface forms for entities and explicitly include them in the plan. Finally, we enhance reliability through plan diversification: multiple candidate plans are generated for each query, with the SQL agent producing a query for each plan, and final output selected via majority voting over their executions.