🤖 AI Summary
This work addresses the challenges of generating user intent sequences in complex spatiotemporal scenarios—specifically, issues of logical incoherence, physical infeasibility, and high inference latency associated with large language models (LLMs). To overcome these limitations, we propose GPlan, a framework that efficiently transfers LLM-based planning capabilities to a lightweight sequence generation model through progressive implicit chain-of-thought distillation and a spatiotemporal counterfactual Direct Preference Optimization (DPO) mechanism. This approach preserves sophisticated reasoning abilities while significantly enhancing sensitivity to real-world spatiotemporal constraints. Both offline evaluations and online A/B tests demonstrate that GPlan substantially outperforms baseline methods in terms of sequence coherence, contextual responsiveness, and execution feasibility. The framework has been successfully deployed in Amap’s production services.
📝 Abstract
Real-world user behavior rarely consists of isolated actions; instead, it often forms intent flows governed by spatiotemporal dependencies. To provide integrated service recommendations, we focus on the task of Generative Spatiotemporal Intent Sequence Recommendation (GSISR), which aims to generate intent sequences that are logically coherent and physically executable within complex spatiotemporal contexts. While LLMs offer strong reasoning potential for GSISR, direct industrial deployment is limited by high inference latency and context-mismatched or physically infeasible plans. To address these challenges, we propose a generative framework, GPlan, that internalizes LLM reasoning into lightweight models through two components. First, to enable reasoning under strict latency constraints, we introduce Progressive Implicit CoT Distillation, which compresses explicit reasoning processes into reserved latent tokens, allowing small models to inherit complex planning logic without generating long reasoning text. Second, to address the disconnect between general knowledge and real-world constraints, we design Spatiotemporal Counterfactual DPO. By aligning the model with counterfactual context-plan pairs, we improve sensitivity to spatiotemporal context and reduce context-mismatched plans. Offline experiments and online A/B testing demonstrate that our approach improves sequence coherence and context responsiveness. Our implementation and the anonymized GSISR dataset are available at https://github.com/alibaba/GPlan.