🤖 AI Summary
This paper identifies an inherent misalignment in unsupervised combinatorial optimization (UCO): during training, models optimize continuous probabilistic solutions and differentiable losses, whereas testing relies on non-differentiable derandomization to obtain deterministic solutions—causing training loss reduction to not necessarily improve actual performance. We are the first to systematically formulate and empirically validate this misalignment across diverse combinatorial optimization tasks. To mitigate it, we integrate differentiable derandomization into the end-to-end training pipeline, enabling consistent optimization of both probabilistic solution modeling and deterministic solution generation. Experiments demonstrate that our approach significantly improves the correlation between training objectives and test performance, establishing a new design paradigm for UCO. Concurrently, we uncover novel challenges introduced by differentiable derandomization—particularly gradient instability—thereby motivating further investigation into robust training strategies for unsupervised discrete optimization.
📝 Abstract
In unsupervised combinatorial optimization (UCO), during training, one aims to have continuous decisions that are promising in a probabilistic sense for each training instance, which enables end-to-end training on initially discrete and non-differentiable problems. At the test time, for each test instance, starting from continuous decisions, derandomization is typically applied to obtain the final deterministic decisions. Researchers have developed more and more powerful test-time derandomization schemes to enhance the empirical performance and the theoretical guarantee of UCO methods. However, we notice a misalignment between training and testing in the existing UCO methods. Consequently, lower training losses do not necessarily entail better post-derandomization performance, even for the training instances without any data distribution shift. Empirically, we indeed observe such undesirable cases. We explore a preliminary idea to better align training and testing in UCO by including a differentiable version of derandomization into training. Our empirical exploration shows that such an idea indeed improves training-test alignment, but also introduces nontrivial challenges into training.