Rethink Efficiency Side of Neural Combinatorial Solver: An Offline and Self-Play Paradigm

📅 2026-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the significant bottlenecks in training efficiency, memory consumption, and throughput that hinder neural combinatorial optimization solvers, where existing online learning paradigms struggle to balance performance and resource overhead. To overcome these limitations, we propose ECO, a novel offline self-play two-stage training framework tailored for neural combinatorial optimization. The approach first initializes the policy via supervised pre-warming and then iteratively refines it using Direct Preference Optimization (DPO). We further integrate the Mamba sequence modeling architecture to reduce memory usage and introduce a heuristic progressive guidance mechanism to enhance training stability. Evaluated on Traveling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) benchmarks, ECO achieves state-of-the-art solution quality while substantially improving training throughput and reducing memory footprint.

Technology Category

Application Category

📝 Abstract
We propose ECO, a versatile learning paradigm that enables efficient offline self-play for Neural Combinatorial Optimization (NCO). ECO addresses key limitations in the field through: 1) Paradigm Shift: Moving beyond inefficient online paradigms, we introduce a two-phase offline paradigm consisting of supervised warm-up and iterative Direct Preference Optimization (DPO); 2) Architecture Shift: We deliberately design a Mamba-based architecture to further enhance the efficiency in the offline paradigm; and 3) Progressive Bootstrapping: To stabilize training, we employ a heuristic-based bootstrapping mechanism that ensures continuous policy improvement during training. Comparison results on TSP and CVRP highlight that ECO performs competitively with up-to-date baselines, with significant advantage on the efficiency side in terms of memory utilization and training throughput. We provide further in-depth analysis on the efficiency, throughput and memory usage of ECO. Ablation studies show rationale behind our designs.
Problem

Research questions and friction points this paper is trying to address.

Neural Combinatorial Optimization
training efficiency
offline learning
memory utilization
training throughput
Innovation

Methods, ideas, or system contributions that make the work stand out.

Offline Self-Play
Direct Preference Optimization
Mamba Architecture
Neural Combinatorial Optimization
Progressive Bootstrapping
🔎 Similar Papers
No similar papers found.
Z
Zhenxing Xu
National Key Laboratory of Big Data and Decision, National University of Defense Technology
Zeyuan Ma
Zeyuan Ma
South China University of Technology
Meta-Black-Box OptimizationReinforcement LearningLearning to Optimize
W
Weidong Bao
National Key Laboratory of Big Data and Decision, National University of Defense Technology
H
Hui Yan
Information Support Force Engineering University
Y
Yan Zheng
National Key Laboratory of Big Data and Decision, National University of Defense Technology
Ji Wang
Ji Wang
National University of Defense Technology
Distributed ComputingMachine Learning