TourPlanner: A Competitive Consensus Framework with Constraint-Gated Reinforcement Learning for Travel Planning

📅 2026-01-08
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes TourPlanner, a novel framework addressing key challenges in automated tour planning: the imbalance between candidate point-of-interest (POI) recall and filtering efficiency, limited exploration of the solution space due to single-path reasoning, and the difficulty of jointly optimizing hard and soft constraints. TourPlanner integrates a Personalized Recall and Spatial Optimization (PReSO) pipeline, a Competitive Consensus Chain-of-Thought (CCoT) multi-path reasoning mechanism, and a Sigmoid-based constraint-gated reinforcement learning strategy. This design enables strict adherence to hard constraints while dynamically optimizing user preferences through soft constraints. Experimental results on standard tour planning benchmarks demonstrate that TourPlanner significantly outperforms existing methods, achieving state-of-the-art performance in both itinerary feasibility and alignment with user preferences.

Technology Category

Application Category

📝 Abstract
Travel planning is a sophisticated decision-making process that requires synthesizing multifaceted information to construct itineraries. However, existing travel planning approaches face several challenges: (1) Pruning candidate points of interest (POIs) while maintaining a high recall rate; (2) A single reasoning path restricts the exploration capability within the feasible solution space for travel planning; (3) Simultaneously optimizing hard constraints and soft constraints remains a significant difficulty. To address these challenges, we propose TourPlanner, a comprehensive framework featuring multi-path reasoning and constraint-gated reinforcement learning. Specifically, we first introduce a Personalized Recall and Spatial Optimization (PReSO) workflow to construct spatially-aware candidate POIs'set. Subsequently, we propose Competitive consensus Chain-of-Thought (CCoT), a multi-path reasoning paradigm that improves the ability of exploring the feasible solution space. To further refine the plan, we integrate a sigmoid-based gating mechanism into the reinforcement learning stage, which dynamically prioritizes soft-constraint satisfaction only after hard constraints are met. Experimental results on travel planning benchmarks demonstrate that TourPlanner achieves state-of-the-art performance, significantly surpassing existing methods in both feasibility and user-preference alignment.
Problem

Research questions and friction points this paper is trying to address.

travel planning
points of interest
constraint optimization
multi-path reasoning
feasible solution space
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-path reasoning
constraint-gated reinforcement learning
competitive consensus
travel planning
hard and soft constraints
🔎 Similar Papers
No similar papers found.
Yinuo Wang
Yinuo Wang
Tsinghua University
LLMReinforcement LearningAutonomous DrivingDiffusion Model
Mining Tan
Mining Tan
Institute of Automation,Chinese Academy of Sciences
Computer VisionMultimediaGenerative AI
W
Wenxiang Jiao
Xiaohongshu Inc.
Xiaoxi Li
Xiaoxi Li
Renmin University of China
RAGLLM ReasoningDeep ResearchAgent
H
Hao Wang
Xiaohongshu Inc.
X
Xuanyu Zhang
Xiaohongshu Inc.
Yuan Lu
Yuan Lu
I-squared-R
BlockchainsDistributed ComputingDecentralization
W
Weiming Dong
MAIS, Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences