MOONSHOT : A Framework for Multi-Objective Pruning of Vision and Large Language Models

📅 2026-04-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

246K/year
🤖 AI Summary
Existing single-objective, one-shot post-training pruning methods exhibit unstable performance across diverse architectures and sparsity levels, struggling to simultaneously optimize reconstruction error and training loss. This work proposes MOONSHOT, a novel framework that introduces multi-objective optimization into one-shot post-training pruning for the first time, jointly minimizing layer-wise reconstruction error and a second-order Taylor approximation of the loss function. To enable efficient optimization without retraining, MOONSHOT incorporates an efficient inverse Hessian approximation algorithm that is compatible with and enhances existing pruning methods. Experiments demonstrate substantial improvements: on Llama-3.2/2, it reduces C4 perplexity by up to 32.6% and boosts zero-shot accuracy by 4.9 points; on ImageNet-1k at 90% sparsity, it improves top-1 accuracy by over 5 points for ViT and over 4 points for ResNet-50.

Technology Category

Application Category

📝 Abstract
Weight pruning is a common technique for compressing large neural networks. We focus on the challenging post-training one-shot setting, where a pre-trained model is compressed without any retraining. Existing one-shot pruning methods typically optimize a single objective, such as a layer-wise reconstruction loss or a second-order Taylor approximation of the training loss. We highlight that neither objective alone is consistently the most effective across architectures and sparsity levels. Motivated by this insight, we propose MOONSHOT, a general and flexible framework that extends any single-objective pruning method into a multi-objective formulation by jointly optimizing both the layer-wise reconstruction error and second-order Taylor approximation of the training loss. MOONSHOT acts as a wrapper around existing pruning algorithms. To enable this integration while maintaining scalability to billion-parameter models, we propose modeling decisions and introduce an efficient procedure for computing the inverse Hessian, preserving the efficiency of state-of-the-art one-shot pruners. When combined with state-of-the-art pruning methods on Llama-3.2 and Llama-2 models, MOONSHOT reduces C4 perplexity by up to 32.6% at 2:4 sparsity and improves zero-shot mean accuracy across seven classification benchmarks by up to 4.9 points. On Vision Transformers, it improves accuracy on ImageNet-1k by over 5 points at 70% sparsity, and on ResNet-50, it yields a 4-point gain at 90% sparsity.
Problem

Research questions and friction points this paper is trying to address.

multi-objective pruning
one-shot pruning
post-training compression
large language models
vision transformers
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-objective pruning
one-shot pruning
Hessian approximation
model compression
post-training pruning
🔎 Similar Papers
No similar papers found.