ToPT: Task-Oriented Prompt Tuning for Urban Region Representation Learning

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing urban area representation methods, which are typically task-agnostic and lack explicit spatial priors or mechanisms for aligning with task semantics. To overcome this, the authors propose ToPT, a two-stage framework that first enhances region interaction modeling by incorporating learnable spatial priors—such as distance and centrality—to generate spatially aware region embeddings. In the second stage, a frozen multimodal large language model (MLLM) is leveraged together with task-specific prompt templates to produce task semantic vectors, which are then aligned with region representations via multi-head cross-attention. Evaluated across multiple cities and downstream tasks, the method achieves state-of-the-art performance, with improvements up to 64.2% in key metrics, demonstrating the effectiveness and complementarity of integrating spatial priors with prompt-driven semantic alignment.

Technology Category

Application Category

📝 Abstract
Learning effective region embeddings from heterogeneous urban data underpins key urban computing tasks (e.g., crime prediction, resource allocation). However, prevailing two-stage methods yield task-agnostic representations, decoupling them from downstream objectives. Recent prompt-based approaches attempt to fix this but introduce two challenges: they often lack explicit spatial priors, causing spatially incoherent inter-region modeling, and they lack robust mechanisms for explicit task-semantic alignment. We propose ToPT, a two-stage framework that delivers spatially consistent fusion and explicit task alignment. ToPT consists of two modules: spatial-aware region embedding learning (SREL) and task-aware prompting for region embeddings (Prompt4RE). SREL employs a Graphormer-based fusion module that injects spatial priors-distance and regional centrality-as learnable attention biases to capture coherent, interpretable inter-region interactions. Prompt4RE performs task-oriented prompting: a frozen multimodal large language model (MLLM) processes task-specific templates to obtain semantic vectors, which are aligned with region embeddings via multi-head cross-attention for stable task conditioning. Experiments across multiple tasks and cities show state-of-the-art performance, with improvements of up to 64.2\%, validating the necessity and complementarity of spatial priors and prompt-region alignment. The code is available at https://github.com/townSeven/Prompt4RE.git.
Problem

Research questions and friction points this paper is trying to address.

urban region representation learning
task-agnostic representations
spatial priors
task-semantic alignment
prompt-based learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

prompt tuning
spatial priors
task-oriented alignment
Graphormer
urban representation learning
Z
Zitao Guo
College of Applied Science, Shenzhen University, Shenzhen, China
C
Changyang Jiang
School of Artificial Intelligence, Shenzhen Technology University, Shenzhen, China
Tianhong Zhao
Tianhong Zhao
Shenzhen Technology University
GISUrban InformaticsSpatiotemporal Prediction
J
Jinzhou Cao
School of Artificial Intelligence, Shenzhen Technology University, Shenzhen, China
Genan Dai
Genan Dai
Shenzhen Technology University
Spatio-temporal Data Mining
Bowen Zhang
Bowen Zhang
Shenzhen Technology University
sentiment analysisstance detectionsocial computing