Do We Really Need SFT? Prompt-as-Policy over Knowledge Graphs for Cold-start Next POI Recommendation

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Under cold-start conditions, user–POI interactions are extremely sparse, posing critical challenges for existing methods—particularly LLM-based approaches—including prohibitively high supervision costs for fine-tuning, poor generalization, and inflexible static prompts incapable of adapting to diverse user contexts. To address these limitations, we propose KG-RLP (Knowledge Graph-enhanced Reinforcement Learning-guided Prompting), a novel framework that models prompt construction as a learnable policy. Leveraging contextual bandits, KG-RLP dynamically selects and composes relation paths from a knowledge graph to generate evidence cards, enabling frozen large language models to perform adaptive reasoning without parameter updates. Crucially, KG-RLP eliminates the need for supervised fine-tuning, transcending both static prompting and parameter-tuning paradigms. Extensive experiments on three real-world datasets demonstrate that KG-RLP improves Acc@1 by 7.7% on average for inactive users under cold start, while maintaining competitive performance for active users.

Technology Category

Application Category

📝 Abstract
Next point-of-interest (POI) recommendation is crucial for smart urban services such as tourism, dining, and transportation, yet most approaches struggle under cold-start conditions where user-POI interactions are sparse. Recent efforts leveraging large language models (LLMs) address this challenge through either supervised fine-tuning (SFT) or in-context learning (ICL). However, SFT demands costly annotations and fails to generalize to inactive users, while static prompts in ICL cannot adapt to diverse user contexts. To overcome these limitations, we propose Prompt-as-Policy over knowledge graphs, a reinforcement-guided prompting framework that learns to construct prompts dynamically through contextual bandit optimization. Our method treats prompt construction as a learnable policy that adaptively determines (i) which relational evidences to include, (ii) the number of evidence per candidate, and (iii) their organization and ordering within prompts. More specifically, we construct a knowledge graph (KG) to discover candidates and mine relational paths, which are transformed into evidence cards that summarize rationales for each candidate POI. The frozen LLM then acts as a reasoning engine, generating recommendations from the KG-discovered candidate set based on the policy-optimized prompts. Experiments on three real-world datasets demonstrate that Prompt-as-Policy consistently outperforms state-of-the-art baselines, achieving average 7.7% relative improvements in Acc@1 for inactive users, while maintaining competitive performance on active users, without requiring model fine-tuning.
Problem

Research questions and friction points this paper is trying to address.

Addressing cold-start POI recommendation with sparse user interactions
Overcoming limitations of supervised fine-tuning and static prompts
Dynamically constructing adaptive prompts through reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement-guided prompting framework for dynamic construction
Knowledge graph evidence cards to summarize candidate rationales
Frozen LLM as reasoning engine without fine-tuning
🔎 Similar Papers
No similar papers found.
J
Jinze Wang
School of Engineering, Swinburne University of Technology, Melbourne, Australia
L
Lu Zhang
School of Cybersecurity, Chengdu University of Information Technology, Chengdu, China
Y
Yiyang Cui
School of Computer Science and Technology, Tongji University, Shanghai, China
Zhishu Shen
Zhishu Shen
Wuhan University of Technology
Xingjun Ma
Xingjun Ma
Fudan University
Trustworthy AIMultimodal AIGenerative AIEmbodied AI
Jiong Jin
Jiong Jin
Professor, Swinburne University of Technology, Melbourne, Australia
Internet of ThingsNetwork OptimizationEdge ComputingNetworked RoboticsIndustrial Automation
Tiehua Zhang
Tiehua Zhang
School of Computer Science and Technology, Tongji University
AIEdge Computing/IntelligenceGraph Learning