TourMart: A Parametric Audit Instrument for Commission Steering in LLM Travel Agents

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This study addresses the challenge of quantifying implicit commission-driven bias in large language model (LLM) agents deployed on online travel platforms, where such agents may preferentially recommend higher-commission products. To this end, the authors propose TourMart, a parameterized auditing framework tailored for LLM-based travel agents. TourMart leverages commission-aware, fact-consistent counterfactual prompt pairs and integrates scenario clustering with hypothesis testing and preference inference—implemented using Qwen-14B and Llama-3.1-8B—to disentangle technical artifacts from genuine commercial steering. The framework introduces tunable governance parameters λ and κ and employs a six-gate symmetric producer audit mechanism. Empirical evaluation under real-world deployment conditions reveals statistically significant commission-induced recommendation shifts: Qwen-14B exhibits a 7.69 percentage point increase (p=0.003), while Llama-3.1-8B shows increases of 2.96–3.50 percentage points (p<0.01).

📝 Abstract

Online travel agents (Booking, Trip.com, Expedia) have replaced ranked-list interfaces with conversational LLM agents that compress many options into one sentence of advice. Each booking earns the OTA commission and different suppliers pay different rates: the agent has a structural incentive to favor higher-margin recommendations. Whether any deployed agent does this, and by how much, no one can currently measure. Disclosure banners, conversion A/B testing, UI dark-pattern taxonomies, and generic LLM safety scores were built for older interfaces and miss the prose-recommendation surface where the steering happens. We propose TourMart, an applied intelligent-system audit instrument for LLM-OTA commission governance. Two governance levers -- lambda (gain on message-induced perception in the traveler's accept/reject decision) and kappa (budget-normalized cap on how far the message can shift perceived welfare) -- drive a paired counterfactual: holding the traveler and bundle fixed, the steering delta is read off between a commission-aware prompt and a minimum-disclosure factual template. A symmetric six-gate producer audit separates LLM-engineering failures (template collapse, refusal, internal-ID leakage) from genuine commercial steering. At deployed (lambda=1, kappa=0.05), a Qwen-14B reader shows +7.69pp steering (exact McNemar p=0.003); a Llama-3.1-8B reader shows +3.50pp in the same direction at n=143, with an extended-n supplement (n=270) confirming significance (+2.96pp, p=0.008). Across the (lambda, kappa) grid both arms pass family-wise scenario-clustered correction (p<0.001 / p=0.008). TourMart outputs a sentence a compliance report can quote: "at this deployment, 7.7 extra commission-steered recommendations per 100 paired traveler sessions."

Problem

Research questions and friction points this paper is trying to address.

commission steering

LLM travel agents

audit instrument

recommendation bias

online travel agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

commission steering

LLM audit

counterfactual evaluation