🤖 AI Summary
This work addresses the over-rationalization in user behavior simulation caused by incomplete information and the challenge that single-model approaches struggle to jointly capture interpretable preferences and implicit statistical patterns. To this end, we propose the Policy-Guided Hybrid Simulation (PGHS) framework, which introduces a novel dual-process architecture. PGHS extracts transferable decision policies from user trajectories as a shared alignment layer, enabling synergistic collaboration between a symbolic reasoning branch powered by large language models and a statistical fitting branch based on machine learning, thereby achieving mutual correction. Evaluated on a real-world dataset comprising 26,000+ trajectories across 101 merchants from Meituan, PGHS reduces group-level simulation error to 8.80%, outperforming the best reasoning- and fitting-based baselines by 45.8% and 40.9%, respectively, while effectively balancing interpretability and generalization.
📝 Abstract
Simulating group-level user behavior enables scalable counterfactual evaluation of merchant strategies without costly online experiments. However, building a trustworthy simulator faces two structural challenges. First, information incompleteness causes reasoning-based simulators to over-rationalize when unobserved factors such as offline context and implicit habits are missing. Second, mechanism duality requires capturing both interpretable preferences and implicit statistical regularities, which no single paradigm achieves alone. We propose Policy-Guided Hybrid Simulation (PGHS), a dual-process framework that mines transferable decision policies from behavioral trajectories and uses them as a shared alignment layer. This layer anchors an LLM-based reasoning branch that prevents over-rationalization and an ML-based fitting branch that absorbs implicit regularities. Group-level predictions from both branches are fused for complementary correction. We deploy PGHS on Meituan with 101 merchants and over 26,000 trajectories. PGHS achieves a group simulation error of 8.80%, improving over the best reasoning-based and fitting-based baselines by 45.8% and 40.9% respectively.