FloodSQL-Bench: A Retrieval-Augmented Benchmark for Geospatially-Grounded Text-to-SQL

📅 2025-12-12

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Existing Text-to-SQL benchmarks lack support for multi-table, geospatial, and cross-modal queries critical to high-stakes domains such as flood management. Method: We introduce FloodSQL, the first geospatially enhanced Text-to-SQL benchmark tailored for flood management, integrating heterogeneous data from social, infrastructure, and disaster sources. It supports key-based, spatial, and hybrid-join queries. We propose a geospatial-aware multi-table evaluation paradigm, featuring a difficulty-stratified taxonomy and a retrieval-augmented unified evaluation protocol. Schema-query pairs are constructed from real-world data, and large language models (LLMs) are evaluated via a RAG framework on spatial predicates, topological relations, and cross-modal alignment tasks. Contribution/Results: Experiments expose significant bottlenecks in LLMs’ spatial reasoning and multi-hop join capabilities. FloodSQL fills a critical gap in domain-specific structured query benchmarking and provides a reproducible, extensible evaluation platform for future research.

Technology Category

Application Category

📝 Abstract

Existing Text-to-SQL benchmarks primarily focus on single-table queries or limited joins in general-purpose domains, and thus fail to reflect the complexity of domain-specific, multi-table and geospatial reasoning, To address this limitation, we introduce FLOODSQL-BENCH, a geospatially grounded benchmark for the flood management domain that integrates heterogeneous datasets through key-based, spatial, and hybrid joins. The benchmark captures realistic flood-related information needs by combining social, infrastructural, and hazard data layers. We systematically evaluate recent large language models with the same retrieval-augmented generation settings and measure their performance across difficulty tiers. By providing a unified, open benchmark grounded in real-world disaster management data, FLOODSQL-BENCH establishes a practical testbed for advancing Text-to-SQL research in high-stakes application domains.

Problem

Research questions and friction points this paper is trying to address.

Addresses lack of geospatial reasoning in Text-to-SQL benchmarks

Integrates heterogeneous datasets for domain-specific flood management queries

Evaluates large language models on realistic multi-table joins

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces geospatially-grounded benchmark for flood management

Integrates heterogeneous datasets via key-based and spatial joins

Evaluates large language models with retrieval-augmented generation settings

🔎 Similar Papers

A Survey on Employing Large Language Models for Text-to-SQL Tasks