π€ AI Summary
Geospatial entity resolution (ER) faces challenges in modeling diverse geometric types (points, lines, polygons) and preserving spatial information. This paper proposes Omni, a novel framework addressing these issues: (1) an omni-geometric encoderβthe first to jointly embed heterogeneous geometries while retaining fine-grained spatial structure; (2) an attribute affinity mechanism that integrates semantic information from textual fields; and (3) a systematic exploration of large language models (LLMs) for geospatial matching via prompt engineering and few-shot learning. Evaluated on a pure-point dataset, Omni achieves a 12% F1-score improvement over prior methods; on a newly constructed multi-geometry benchmark, it significantly outperforms existing approaches. This work establishes the first unified modeling paradigm integrating geometric diversity and semantic depth for geospatial ER, and empirically demonstrates the viability of LLMs in this domain.
π Abstract
The development, integration, and maintenance of geospatial databases rely heavily on efficient and accurate matching procedures of Geospatial Entity Resolution (ER). While resolution of points-of-interest (POIs) has been widely addressed, resolution of entities with diverse geometries has been largely overlooked. This is partly due to the lack of a uniform technique for embedding heterogeneous geometries seamlessly into a neural network framework. Existing neural approaches simplify complex geometries to a single point, resulting in significant loss of spatial information. To address this limitation, we propose Omni, a geospatial ER model featuring an omni-geometry encoder. This encoder is capable of embedding point, line, polyline, polygon, and multi-polygon geometries, enabling the model to capture the complex geospatial intricacies of the places being compared. Furthermore, Omni leverages transformer-based pre-trained language models over individual textual attributes of place records in an Attribute Affinity mechanism. The model is rigorously tested on existing point-only datasets and a new diverse-geometry geospatial ER dataset. Omni produces up to 12% (F1) improvement over existing methods.
Furthermore, we test the potential of Large Language Models (LLMs) to conduct geospatial ER, experimenting with prompting strategies and learning scenarios, comparing the results of pre-trained language model-based methods with LLMs. Results indicate that LLMs show competitive results.