Rethinking Parameter Sharing as Graph Coloring for Structured Compression

📅 2025-11-10

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Large-scale deep models incur high memory overhead during inference due to their massive parameter counts. Existing parameter-sharing methods rely on heuristic, adjacent-layer designs and lack systematic scalability across multiple layers. This work pioneers a graph coloring formulation for cross-layer parameter sharing, leveraging structural symmetry in the model’s parameter space for rigorous, system-level modeling. From a group-theoretic perspective, we analyze sharing mechanisms and introduce an analytical criterion grounded in second-order gradient geometry—guiding parameter projection onto low-curvature subspaces. By combining Hessian spectral analysis with Taylor expansion, we formulate optimal parameter grouping as finding the optimal coloring function α: L → C. Evaluated across diverse architectures and tasks, our method consistently outperforms state-of-the-art approaches, achieving superior accuracy at higher compression ratios—demonstrating both theoretical rigor and engineering scalability.

Technology Category

Application Category

📝 Abstract

Modern deep models have massive parameter sizes, leading to high inference-time memory usage that limits practical deployment. Parameter sharing, a form of structured compression, effectively reduces redundancy, but existing approaches remain heuristic-restricted to adjacent layers and lacking a systematic analysis for cross-layer sharing. However, extending sharing across multiple layers leads to an exponentially expanding configuration space, making exhaustive search computationally infeasible and forming a critical bottleneck for parameter sharing. We recast parameter sharing from a group-theoretic perspective as introducing structural symmetries in the model's parameter space. A sharing configuration can be described by a coloring function $alpha:L ightarrow C$ (L: layer indices and C: sharing classes), which determines inter-layer sharing groups while preserving structural symmetry. To determine the coloring function, we propose a second-order geometric criterion based on Taylor expansion and the Hessian spectrum. By projecting perturbations onto the Hessian's low-curvature eigensubspace, the criterion provides an analytic rule for selecting sharing groups that minimize performance impact, yielding a principled and scalable configuration procedure. Across diverse architectures and tasks, Geo-Sharing consistently outperforms state-of-the-art heuristic sharing strategies, achieving higher compression ratios with smaller accuracy degradation.

Problem

Research questions and friction points this paper is trying to address.

Reducing massive parameter sizes in deep models

Systematically analyzing cross-layer parameter sharing configurations

Minimizing performance impact while achieving high compression ratios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reformulating parameter sharing as graph coloring problem

Using Hessian spectrum for geometric sharing criterion

Enabling cross-layer sharing via principled configuration procedure

🔎 Similar Papers

No similar papers found.