🤖 AI Summary
Existing combinatorial optimization (CO) solvers suffer from poor generalization across diverse problem instances and high retraining costs. Method: We propose UniCO—the first unified CO solver—based on a single Transformer architecture and shared parameter set. It formalizes heterogeneous CO problems as Markov decision processes (MDPs), employs trajectory tokenization, and introduces a novel CO-prefix encoding to aggregate static problem-structure features. A two-stage self-supervised pretraining framework decouples dynamic state prediction from policy generation, enabling cross-problem knowledge transfer. Contribution/Results: Evaluated on ten canonical CO tasks, UniCO achieves efficient zero-shot or few-shot adaptation to unseen problems, eliminating the need for task-specific architectures or extensive fine-tuning. It significantly improves model generality, reduces deployment overhead, and establishes a new paradigm for scalable, reusable CO modeling.
📝 Abstract
Combinatorial Optimization (CO) encompasses a wide range of problems that arise in many real-world scenarios. While significant progress has been made in developing learning-based methods for specialized CO problems, a unified model with a single architecture and parameter set for diverse CO problems remains elusive. Such a model would offer substantial advantages in terms of efficiency and convenience. In this paper, we introduce UniCO, a unified model for solving various CO problems. Inspired by the success of next-token prediction, we frame each problem-solving process as a Markov Decision Process (MDP), tokenize the corresponding sequential trajectory data, and train the model using a transformer backbone. To reduce token length in the trajectory data, we propose a CO-prefix design that aggregates static problem features. To address the heterogeneity of state and action tokens within the MDP, we employ a two-stage self-supervised learning approach. In this approach, a dynamic prediction model is first trained and then serves as a pre-trained model for subsequent policy generation. Experiments across 10 CO problems showcase the versatility of UniCO, emphasizing its ability to generalize to new, unseen problems with minimal fine-tuning, achieving even few-shot or zero-shot performance. Our framework offers a valuable complement to existing neural CO methods that focus on optimizing performance for individual problems.