TRACE: Tourism Recommendation with Accountable Citation Evidence

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

This study addresses critical limitations in existing conversational recommender systems for tourism—namely, the lack of verifiable grounding, inadequate multi-turn preference adaptation, and insufficient capability to recover from user rejections—hindering simultaneous accuracy, traceability, and adaptability. To bridge this gap, we introduce the first multi-turn tourism dialogue dataset (comprising 10,000 turns, 2,400 points of interest, and 34,208 real user reviews) that explicitly incorporates authentic review snippets as justifications and includes dedicated rejection-and-recovery turns. We further propose a novel method that jointly models multi-turn recommendation, verbatim review grounding, and rejection recovery. A comprehensive three-dimensional evaluation framework—assessing accuracy, grounding quality, and resilience—is established and validated through 14 baselines and 25 metrics, demonstrating high alignment between our Grounding Score and human judgments (Spearman ρ = 0.80) and statistically significant reproducibility of performance rankings (p < 0.01).

📝 Abstract

Tourism is a high-stakes setting for conversational recommender systems (CRS): a plausible-sounding suggestion can waste real money and trip time once a traveler acts on it. Existing CRS benchmarks primarily evaluate systems with a single Recall@k score over entity mentions, and tourism-specific resources add spatial or knowledge-graph context, yet none of them couple multi-turn recommendation with verbatim review-span evidence and rejection recovery. This leaves an evaluation gap for tourism recommendation that is simultaneously trustworthy, verifiable, and adaptive: recommend the right point of interest (POI) for multi-aspect preferences (such as cuisine, price, atmosphere, walking distance), justify each suggestion with verifiable evidence from prior visitors so the traveler can act without trial and error, and recover when the first recommendation is rejected mid-dialogue. We introduce TRACE, where each item is a multi-turn tourism recommendation dialogue with review-span citations and explicit rejection turns: 10,000 dialogues over 2,400 Yelp POIs and 34,208 reviews across eight U.S. cities, paired with 14 retrieval, planning, and LLM baselines, along with 25 metrics organized under Accuracy, Grounding, and Recovery. Across these baselines, TRACE reveals the Three-Competency Gap: LLM Zero-Shot leads in closed-set Recall@1 and rejection recovery but cites less densely than retrievers; non-LLM retrievers achieve surface-verbatim grounding but with low accuracy; Multi-Review Synthesis fails at recovery. The Grounding Score agrees with human citation precision (Spearman rho=+0.80, p<10^-20), and paired t-tests reproduce the per-baseline ranking (p<0.01 on the dominant contrasts). TRACE reframes accountable tourism recommendation as a joint target (right POI, verifiable evidence, adaptive repair) rather than a single-axis leaderboard.

Problem

Research questions and friction points this paper is trying to address.

conversational recommender systems

tourism recommendation

verifiable evidence

rejection recovery

multi-turn dialogue

Innovation

Methods, ideas, or system contributions that make the work stand out.

conversational recommender systems

verifiable citation

rejection recovery