🤖 AI Summary
Mallows kernels in permutation-space Bayesian optimization suffer from high computational complexity—Ω(n²)—hindering scalability. Method: We propose a novel kernel design paradigm grounded in sorting algorithms, interpreting the Mallows kernel as a kernelized implementation of bubble sort and introducing the Merge Kernel based on merge sort, reducing time complexity to Θ(n log n). The Merge Kernel integrates three lightweight descriptors—displacement histograms, split-pair sequences, and sliding-window motifs—to jointly model global misalignments, long-range comparisons, and local ordinal structure. Contribution/Results: Across multiple permutation optimization benchmarks, the Merge Kernel consistently outperforms the Mallows kernel with a more compact representation, yielding significant gains in both optimization efficiency and final performance. This work provides a scalable, interpretable kernel-based tool for Bayesian optimization over discrete combinatorial structures.
📝 Abstract
Bayesian Optimization (BO) algorithm is a standard tool for black-box optimization problems. The current state-of-the-art BO approach for permutation spaces relies on the Mallows kernel-an $Ω(n^2)$ representation that explicitly enumerates every pairwise comparison. Inspired by the close relationship between the Mallows kernel and pairwise comparison, we propose a novel framework for generating kernel functions on permutation space based on sorting algorithms. Within this framework, the Mallows kernel can be viewed as a special instance derived from bubble sort. Further, we introduce the extbf{Merge Kernel} constructed from merge sort, which replaces the quadratic complexity with $Θ(nlog n)$ to achieve the lowest possible complexity. The resulting feature vector is significantly shorter, can be computed in linearithmic time, yet still efficiently captures meaningful permutation distances. To boost robustness and right-invariance without sacrificing compactness, we further incorporate three lightweight, task-agnostic descriptors: (1) a shift histogram, which aggregates absolute element displacements and supplies a global misplacement signal; (2) a split-pair line, which encodes selected long-range comparisons by aligning elements across the two halves of the whole permutation; and (3) sliding-window motifs, which summarize local order patterns that influence near-neighbor objectives. Our empirical evaluation demonstrates that the proposed kernel consistently outperforms the state-of-the-art Mallows kernel across various permutation optimization benchmarks. Results confirm that the Merge Kernel provides a more compact yet more effective solution for Bayesian optimization in permutation space.