MOONSHOT : A Framework for Multi-Objective Pruning of Vision and Large Language Models

📅 2026-04-14

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

Existing single-objective, one-shot post-training pruning methods exhibit unstable performance across diverse architectures and sparsity levels, struggling to simultaneously optimize reconstruction error and training loss. This work proposes MOONSHOT, a novel framework that introduces multi-objective optimization into one-shot post-training pruning for the first time, jointly minimizing layer-wise reconstruction error and a second-order Taylor approximation of the loss function. To enable efficient optimization without retraining, MOONSHOT incorporates an efficient inverse Hessian approximation algorithm that is compatible with and enhances existing pruning methods. Experiments demonstrate substantial improvements: on Llama-3.2/2, it reduces C4 perplexity by up to 32.6% and boosts zero-shot accuracy by 4.9 points; on ImageNet-1k at 90% sparsity, it improves top-1 accuracy by over 5 points for ViT and over 4 points for ResNet-50.

Technology Category

Application Category

📝 Abstract

Weight pruning is a common technique for compressing large neural networks. We focus on the challenging post-training one-shot setting, where a pre-trained model is compressed without any retraining. Existing one-shot pruning methods typically optimize a single objective, such as a layer-wise reconstruction loss or a second-order Taylor approximation of the training loss. We highlight that neither objective alone is consistently the most effective across architectures and sparsity levels. Motivated by this insight, we propose MOONSHOT, a general and flexible framework that extends any single-objective pruning method into a multi-objective formulation by jointly optimizing both the layer-wise reconstruction error and second-order Taylor approximation of the training loss. MOONSHOT acts as a wrapper around existing pruning algorithms. To enable this integration while maintaining scalability to billion-parameter models, we propose modeling decisions and introduce an efficient procedure for computing the inverse Hessian, preserving the efficiency of state-of-the-art one-shot pruners. When combined with state-of-the-art pruning methods on Llama-3.2 and Llama-2 models, MOONSHOT reduces C4 perplexity by up to 32.6% at 2:4 sparsity and improves zero-shot mean accuracy across seven classification benchmarks by up to 4.9 points. On Vision Transformers, it improves accuracy on ImageNet-1k by over 5 points at 70% sparsity, and on ResNet-50, it yields a 4-point gain at 90% sparsity.

Problem

Research questions and friction points this paper is trying to address.

multi-objective pruning

one-shot pruning

post-training compression

large language models

vision transformers

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-objective pruning

one-shot pruning

Hessian approximation