Modelling Customer Trajectories with Reinforcement Learning for Practical Retail Insights

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This study addresses the challenge of obtaining authentic customer trajectories in retail environments, where existing heuristic approaches—such as the Traveling Salesman Problem (TSP) and Probabilistic Nearest Neighbor (PNN)—often fail due to their unrealistic assumption of shortest-path behavior, leading to significant deviations from actual shopping patterns. To overcome this limitation, the paper introduces maximum entropy reinforcement learning for the first time to model customer movement, enabling an agent to simulate the exploration–exploitation trade-off under bounded rationality. By balancing reward maximization with behavioral stochasticity, the proposed method generates trajectories that are both realistic and computationally tractable. Empirical results demonstrate that these synthetic trajectories substantially outperform TSP and PNN baselines, yielding more accurate estimates of impulse purchase rates and shelf traffic density, and supporting product relocation decisions consistent with real-world data—ultimately driving measurable profit gains.

📝 Abstract

Understanding customer movement within retail spaces is essential for optimizing store layouts. Real-world trajectory data can provide highly accurate insights, but collecting it is costly and often infeasible for many retailers. Heuristics such as Travelling Salesman Problem (TSP) and Probabilistic Nearest Neighbours (PNN) are commonly used as inexpensive approximations, but actual customer trajectories deviate by an average of 28% from shortest paths, highlighting a tradeoff between accuracy and practicality. We propose an agent-based modelling framework that casts customer trajectory prediction as a maximum entropy reinforcement learning (RL) problem, balancing reward maximization with stochasticity to better reflect customers with bounded rationality. Using real-world trajectory data from a convenience store, we show that RL-generated trajectories align more closely with customer behaviour than TSP and PNN, providing more accurate estimates of impulse purchase rates and shelf traffic densities. Furthermore, only RL-based predictions yield repositioning decisions for impulse products that align with those derived from actual trajectory data, resulting in comparable estimated profit gains. Our work demonstrates that RL provides a practical, behaviourally grounded alternative that bridges the gap between oversimplified heuristics and data-intensive approaches, making accurate layout optimization more accessible. To encourage further research, the source code is available on GitHub.

Problem

Research questions and friction points this paper is trying to address.

customer trajectories

retail layout optimization

trajectory modeling

bounded rationality

impulse purchase

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning

Customer Trajectory Modelling

Maximum Entropy RL