OsmT: Bridging OpenStreetMap Queries and Natural Language with Open-source Tag-aware Language Models

📅 2025-12-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high cost, low transparency, and deployment difficulty of large-parameter closed-source models in natural language-to-OverpassQL (NL2OverpassQL) translation. We propose a lightweight, open-source solution centered on a Tag Retrieval Augmentation (TRA) mechanism that explicitly models the hierarchical and relational structure of OpenStreetMap (OSM) tags. To enhance semantic alignment, we introduce a reverse task—OverpassQL-to-text generation—enabling bidirectional semantic grounding. Our approach leverages open pre-trained language models and integrates context-aware tag retrieval with dual generative learning: NL2OverpassQL and OverpassQL2Text. Evaluated on a public benchmark, our method significantly outperforms strong baselines despite using substantially fewer parameters. Results demonstrate superior efficiency, interpretability, and deployment feasibility for complex geospatial semantic parsing.

Technology Category

Application Category

📝 Abstract
Bridging natural language and structured query languages is a long-standing challenge in the database community. While recent advances in language models have shown promise in this direction, existing solutions often rely on large-scale closed-source models that suffer from high inference costs, limited transparency, and lack of adaptability for lightweight deployment. In this paper, we present OsmT, an open-source tag-aware language model specifically designed to bridge natural language and Overpass Query Language (OverpassQL), a structured query language for accessing large-scale OpenStreetMap (OSM) data. To enhance the accuracy and structural validity of generated queries, we introduce a Tag Retrieval Augmentation (TRA) mechanism that incorporates contextually relevant tag knowledge into the generation process. This mechanism is designed to capture the hierarchical and relational dependencies present in the OSM database, addressing the topological complexity inherent in geospatial query formulation. In addition, we define a reverse task, OverpassQL-to-Text, which translates structured queries into natural language explanations to support query interpretation and improve user accessibility. We evaluate OsmT on a public benchmark against strong baselines and observe consistent improvements in both query generation and interpretation. Despite using significantly fewer parameters, our model achieves competitive accuracy, demonstrating the effectiveness of open-source pre-trained language models in bridging natural language and structured query languages within schema-rich geospatial environments.
Problem

Research questions and friction points this paper is trying to address.

Bridging natural language and Overpass Query Language for OpenStreetMap
Enhancing query accuracy with tag retrieval augmentation mechanism
Translating structured queries to natural language for user accessibility
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source tag-aware language model for OSM queries
Tag Retrieval Augmentation mechanism for query accuracy
Reverse task OverpassQL-to-Text for query interpretation
🔎 Similar Papers
No similar papers found.