KLAS: Using Similarity to Stitch Neural Networks for Improved Accuracy-Efficiency Tradeoffs

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing neural network stitching methods rely on heuristic strategies that struggle to generalize across model families, resulting in suboptimal trade-offs between accuracy and efficiency. This work proposes KLAS, a novel framework that introduces KL divergence to measure the similarity of intermediate representations from pretrained models. By leveraging activation alignment and an efficient search mechanism, KLAS automatically identifies the optimal stitching configuration, enabling generalizable and automated model stitching. Evaluated on ImageNet-1K, KLAS achieves up to a 1.21% improvement in Top-1 accuracy at the same computational cost, or alternatively reduces FLOPs by 33% while maintaining comparable accuracy.
📝 Abstract
Given the wide range of deployment targets, flexible model selection is essential for optimizing performance within a given compute budget. Recent work demonstrates that stitching pretrained models within a model family enables cost-effective interpolation of the accuracy-efficiency tradeoff space. Stitching transforms intermediate activations from one pretrained model into another, producing a new interpolated stitched network. Such networks provide a pool of deployment options along the accuracy-efficiency spectrum. However, existing stitching approaches often yield suboptimal tradeoffs and lack generalizability, as they primarily rely on heuristics to select stitch configurations. We argue that constructing improved accuracy-efficiency tradeoffs requires explicitly capturing and leveraging the similarity between pretrained models being stitched. To this end, we introduce KLAS, a novel stitch selection framework that automates and generalizes stitch selection across model families by leveraging KL divergence between intermediate representations. KLAS identifies the most promising binary stitches from the $O(k^2n^2)$ possibilities for $k$ pretrained models of depth $n$. Through comprehensive experiments, we demonstrate that KLAS improves the accuracy-efficiency curve of stitched models at the same finetuning cost as baselines. KLAS achieves up to $1.21\%$ higher ImageNet-1K top-1 accuracy at the same computational cost, or maintains accuracy with a $1.33\times$ reduction in FLOPs.
Problem

Research questions and friction points this paper is trying to address.

stitching
accuracy-efficiency tradeoff
model similarity
neural networks
KL divergence
Innovation

Methods, ideas, or system contributions that make the work stand out.

stitching
KL divergence
accuracy-efficiency tradeoff
neural network interpolation
model similarity