ToPT: Task-Oriented Prompt Tuning for Urban Region Representation Learning

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

This work addresses the limitations of existing urban area representation methods, which are typically task-agnostic and lack explicit spatial priors or mechanisms for aligning with task semantics. To overcome this, the authors propose ToPT, a two-stage framework that first enhances region interaction modeling by incorporating learnable spatial priors—such as distance and centrality—to generate spatially aware region embeddings. In the second stage, a frozen multimodal large language model (MLLM) is leveraged together with task-specific prompt templates to produce task semantic vectors, which are then aligned with region representations via multi-head cross-attention. Evaluated across multiple cities and downstream tasks, the method achieves state-of-the-art performance, with improvements up to 64.2% in key metrics, demonstrating the effectiveness and complementarity of integrating spatial priors with prompt-driven semantic alignment.

Technology Category

Application Category

📝 Abstract

Learning effective region embeddings from heterogeneous urban data underpins key urban computing tasks (e.g., crime prediction, resource allocation). However, prevailing two-stage methods yield task-agnostic representations, decoupling them from downstream objectives. Recent prompt-based approaches attempt to fix this but introduce two challenges: they often lack explicit spatial priors, causing spatially incoherent inter-region modeling, and they lack robust mechanisms for explicit task-semantic alignment. We propose ToPT, a two-stage framework that delivers spatially consistent fusion and explicit task alignment. ToPT consists of two modules: spatial-aware region embedding learning (SREL) and task-aware prompting for region embeddings (Prompt4RE). SREL employs a Graphormer-based fusion module that injects spatial priors-distance and regional centrality-as learnable attention biases to capture coherent, interpretable inter-region interactions. Prompt4RE performs task-oriented prompting: a frozen multimodal large language model (MLLM) processes task-specific templates to obtain semantic vectors, which are aligned with region embeddings via multi-head cross-attention for stable task conditioning. Experiments across multiple tasks and cities show state-of-the-art performance, with improvements of up to 64.2\%, validating the necessity and complementarity of spatial priors and prompt-region alignment. The code is available at https://github.com/townSeven/Prompt4RE.git.

Problem

Research questions and friction points this paper is trying to address.

urban region representation learning

task-agnostic representations

spatial priors

task-semantic alignment

prompt-based learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

prompt tuning

spatial priors

task-oriented alignment