Rethinking Parameter Sharing as Graph Coloring for Structured Compression

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large-scale deep models incur high memory overhead during inference due to their massive parameter counts. Existing parameter-sharing methods rely on heuristic, adjacent-layer designs and lack systematic scalability across multiple layers. This work pioneers a graph coloring formulation for cross-layer parameter sharing, leveraging structural symmetry in the model’s parameter space for rigorous, system-level modeling. From a group-theoretic perspective, we analyze sharing mechanisms and introduce an analytical criterion grounded in second-order gradient geometry—guiding parameter projection onto low-curvature subspaces. By combining Hessian spectral analysis with Taylor expansion, we formulate optimal parameter grouping as finding the optimal coloring function α: L → C. Evaluated across diverse architectures and tasks, our method consistently outperforms state-of-the-art approaches, achieving superior accuracy at higher compression ratios—demonstrating both theoretical rigor and engineering scalability.

Technology Category

Application Category

📝 Abstract
Modern deep models have massive parameter sizes, leading to high inference-time memory usage that limits practical deployment. Parameter sharing, a form of structured compression, effectively reduces redundancy, but existing approaches remain heuristic-restricted to adjacent layers and lacking a systematic analysis for cross-layer sharing. However, extending sharing across multiple layers leads to an exponentially expanding configuration space, making exhaustive search computationally infeasible and forming a critical bottleneck for parameter sharing. We recast parameter sharing from a group-theoretic perspective as introducing structural symmetries in the model's parameter space. A sharing configuration can be described by a coloring function $alpha:L ightarrow C$ (L: layer indices and C: sharing classes), which determines inter-layer sharing groups while preserving structural symmetry. To determine the coloring function, we propose a second-order geometric criterion based on Taylor expansion and the Hessian spectrum. By projecting perturbations onto the Hessian's low-curvature eigensubspace, the criterion provides an analytic rule for selecting sharing groups that minimize performance impact, yielding a principled and scalable configuration procedure. Across diverse architectures and tasks, Geo-Sharing consistently outperforms state-of-the-art heuristic sharing strategies, achieving higher compression ratios with smaller accuracy degradation.
Problem

Research questions and friction points this paper is trying to address.

Reducing massive parameter sizes in deep models
Systematically analyzing cross-layer parameter sharing configurations
Minimizing performance impact while achieving high compression ratios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reformulating parameter sharing as graph coloring problem
Using Hessian spectrum for geometric sharing criterion
Enabling cross-layer sharing via principled configuration procedure
🔎 Similar Papers
No similar papers found.
B
Boyang Zhang
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
D
Daning Cheng
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Yunquan Zhang
Yunquan Zhang
Professor of Institute of Computing Technology, CAS
parallel computingparallel programmingparallel computational model