FloodSQL-Bench: A Retrieval-Augmented Benchmark for Geospatially-Grounded Text-to-SQL

📅 2025-12-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Text-to-SQL benchmarks lack support for multi-table, geospatial, and cross-modal queries critical to high-stakes domains such as flood management. Method: We introduce FloodSQL, the first geospatially enhanced Text-to-SQL benchmark tailored for flood management, integrating heterogeneous data from social, infrastructure, and disaster sources. It supports key-based, spatial, and hybrid-join queries. We propose a geospatial-aware multi-table evaluation paradigm, featuring a difficulty-stratified taxonomy and a retrieval-augmented unified evaluation protocol. Schema-query pairs are constructed from real-world data, and large language models (LLMs) are evaluated via a RAG framework on spatial predicates, topological relations, and cross-modal alignment tasks. Contribution/Results: Experiments expose significant bottlenecks in LLMs’ spatial reasoning and multi-hop join capabilities. FloodSQL fills a critical gap in domain-specific structured query benchmarking and provides a reproducible, extensible evaluation platform for future research.

Technology Category

Application Category

📝 Abstract
Existing Text-to-SQL benchmarks primarily focus on single-table queries or limited joins in general-purpose domains, and thus fail to reflect the complexity of domain-specific, multi-table and geospatial reasoning, To address this limitation, we introduce FLOODSQL-BENCH, a geospatially grounded benchmark for the flood management domain that integrates heterogeneous datasets through key-based, spatial, and hybrid joins. The benchmark captures realistic flood-related information needs by combining social, infrastructural, and hazard data layers. We systematically evaluate recent large language models with the same retrieval-augmented generation settings and measure their performance across difficulty tiers. By providing a unified, open benchmark grounded in real-world disaster management data, FLOODSQL-BENCH establishes a practical testbed for advancing Text-to-SQL research in high-stakes application domains.
Problem

Research questions and friction points this paper is trying to address.

Addresses lack of geospatial reasoning in Text-to-SQL benchmarks
Integrates heterogeneous datasets for domain-specific flood management queries
Evaluates large language models on realistic multi-table joins
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces geospatially-grounded benchmark for flood management
Integrates heterogeneous datasets via key-based and spatial joins
Evaluates large language models with retrieval-augmented generation settings
🔎 Similar Papers
No similar papers found.
H
Hanzhou Liu
Texas A&M University
Kai Yin
Kai Yin
Expedia Group
TransportationAutonomous VehiclesStochastic ModelingApplied StatisticsOptimization
Z
Zhitong Chen
Texas A&M University
C
Chenyue Liu
Texas A&M University
A
Ali Mostafavi
Texas A&M University