🤖 AI Summary
This paper addresses critical limitations—poor scalability, opaque decision-making, and unstable daily behavioral patterns—in LLM-driven agent-based traffic models. To this end, we propose a Large Language Model-Guided Representative Agent Reinforcement Learning (LLM-RA-RL) framework. Our method represents a homogeneous traveler population via a single LLM-powered agent and integrates an interpretable hybrid policy update with a progressively decaying step-size mechanism, thereby ensuring decision transparency while substantially reducing computational overhead. Experimental results demonstrate rapid convergence to user equilibrium across diverse traffic scenarios. The framework robustly reproduces well-established behavioral phenomena—including the decoy effect and income-sensitive willingness-to-pay heterogeneity—while simultaneously achieving high simulation efficiency, dynamically plausible evolution, and full result interpretability.
📝 Abstract
Large language models (LLMs) are increasingly used as behavioral proxies for self-interested travelers in agent-based traffic models. Although more flexible and generalizable than conventional models, the practical use of these approaches remains limited by scalability due to the cost of calling one LLM for every traveler. Moreover, it has been found that LLM agents often make opaque choices and produce unstable day-to-day dynamics. To address these challenges, we propose to model each homogeneous traveler group facing the same decision context with a single representative LLM agent who behaves like the population's average, maintaining and updating a mixed strategy over routes that coincides with the group's aggregate flow proportions. Each day, the LLM reviews the travel experience and flags routes with positive reinforcement that they hope to use more often, and an interpretable update rule then converts this judgment into strategy adjustments using a tunable (progressively decaying) step size. The representative-agent design improves scalability, while the separation of reasoning from updating clarifies the decision logic while stabilizing learning. In classic traffic assignment settings, we find that the proposed approach converges rapidly to the user equilibrium. In richer settings with income heterogeneity, multi-criteria costs, and multi-modal choices, the generated dynamics remain stable and interpretable, reproducing plausible behavioral patterns well-documented in psychology and economics, for example, the decoy effect in toll versus non-toll road selection, and higher willingness-to-pay for convenience among higher-income travelers when choosing between driving, transit, and park-and-ride options.