🤖 AI Summary
For NP-hard routing problems (e.g., TSP, CVRP), existing neural solvers lack real-time adaptability during inference: they cannot dynamically leverage newly available computational budget or instance-specific information. This work proposes MEMENTO, the first neural solver framework incorporating a dynamic memory mechanism into inference. MEMENTO online updates action distributions by integrating historical decision feedback, enabling zero-shot ensemble of diverse solvers without fine-tuning or pre-trained policy collections. The method unifies reinforcement learning, autoregressive modeling, memory-augmented networks, and online policy adaptation. Evaluated on 12 benchmark tasks, MEMENTO achieves new state-of-the-art (SOTA) results on 11—significantly outperforming both tree search and policy-gradient fine-tuning approaches. Crucially, it demonstrates strong scalability, data efficiency, and solution quality on large-scale TSP and CVRP instances.
📝 Abstract
Combinatorial Optimization is crucial to numerous real-world applications, yet still presents challenges due to its (NP-)hard nature. Amongst existing approaches, heuristics often offer the best trade-off between quality and scalability, making them suitable for industrial use. While Reinforcement Learning (RL) offers a flexible framework for designing heuristics, its adoption over handcrafted heuristics remains incomplete within industrial solvers. Existing learned methods still lack the ability to adapt to specific instances and fully leverage the available computational budget. The current best methods either rely on a collection of pre-trained policies, or on data-inefficient fine-tuning; hence failing to fully utilize newly available information within the constraints of the budget. In response, we present MEMENTO, an approach that leverages memory to improve the adaptation of neural solvers at inference time. MEMENTO enables updating the action distribution dynamically based on the outcome of previous decisions. We validate its effectiveness on benchmark problems, in particular Traveling Salesman and Capacitated Vehicle Routing, demonstrating its superiority over tree-search and policy-gradient fine-tuning; and showing it can be zero-shot combined with diversity-based solvers. We successfully train all RL auto-regressive solvers on large instances, and show that MEMENTO can scale and is data-efficient. Overall, MEMENTO enables to push the state-of-the-art on 11 out of 12 evaluated tasks.