Pluggable Pruning with Contiguous Layer Distillation for Diffusion Transformers

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high parameter count and computational cost of diffusion Transformers (DiTs), which hinder deployment in resource-constrained settings, this work proposes a plug-and-play structured pruning framework. It integrates linear probing with first-order differential similarity analysis to precisely identify redundant layers, and introduces a teacher-student alternating consecutive-layer distillation mechanism enabling retraining-free knowledge transfer across multiple pruning ratios. The method achieves joint depth- and width-wise compression within a single-stage training process for flexible model slimming. Experiments on multimodal DiTs demonstrate 50% parameter reduction with <3% degradation in key generation metrics (FID and LPIPS), preserving robust visual quality. The core innovations lie in a differentiable analytical paradigm for redundancy assessment and a modular, plug-and-play distillation architecture—significantly enhancing DiT deployability and generalization adaptability.

Technology Category

Application Category

📝 Abstract
Diffusion Transformers (DiTs) have shown exceptional performance in image generation, yet their large parameter counts incur high computational costs, impeding deployment in resource-constrained settings. To address this, we propose Pluggable Pruning with Contiguous Layer Distillation (PPCL), a flexible structured pruning framework specifically designed for DiT architectures. First, we identify redundant layer intervals through a linear probing mechanism combined with the first-order differential trend analysis of similarity metrics. Subsequently, we propose a plug-and-play teacher-student alternating distillation scheme tailored to integrate depth-wise and width-wise pruning within a single training phase. This distillation framework enables flexible knowledge transfer across diverse pruning ratios, eliminating the need for per-configuration retraining. Extensive experiments on multiple Multi-Modal Diffusion Transformer architecture models demonstrate that PPCL achieves a 50% reduction in parameter count compared to the full model, with less than 3% degradation in key objective metrics. Notably, our method maintains high-quality image generation capabilities while achieving higher compression ratios, rendering it well-suited for resource-constrained environments. The open-source code, checkpoints for PPCL can be found at the following link: https://github.com/OPPO-Mente-Lab/Qwen-Image-Pruning.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational costs of Diffusion Transformers
Identifying redundant layers through similarity analysis
Enabling flexible pruning without per-configuration retraining
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prunes redundant layers via differential trend analysis
Uses plug-and-play teacher-student alternating distillation scheme
Enables flexible knowledge transfer across diverse pruning ratios
🔎 Similar Papers
No similar papers found.
J
Jian Ma
OPPO AI Center
Qirong Peng
Qirong Peng
OPPO AI Center
Xujie Zhu
Xujie Zhu
Sun Yat-sen University
image translationimage editingdiffusion model
P
Peixing Xie
The Chinese University of Hong Kong
C
Chen Chen
OPPO AI Center
H
Haonan Lu
OPPO AI Center