Sparse Computations in Deep Learning Inference

📅 2025-12-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The computational and energy overhead of deep learning inference is increasingly prohibitive, yet sparsity—a key optimization avenue—remains underutilized in production systems. Method: Targeting performance engineers, this work systematically surveys structured and unstructured sparsity exploitable in DNN inference and proposes an end-to-end engineering methodology—from sparse model representation to efficient sparse kernels (SpMM/SDDMM). We implement and benchmark multiple sparse computation schemes on CPU and GPU platforms, integrating support for mainstream frameworks, toolchains, and datasets. Contribution/Results: We present the first production-grade sparse inference reference framework encompassing hardware adaptation, kernel optimization, and deployment validation. Experiments demonstrate 2–5× inference speedup and substantial energy efficiency gains across representative models, establishing a reproducible, scalable practical paradigm for industrial deployment of sparse deep learning.

Technology Category

Application Category

📝 Abstract
The computational demands of modern Deep Neural Networks (DNNs) are immense and constantly growing. While training costs usually capture public attention, inference demands are also contributing in significant computational, energy and environmental footprints. Sparsity stands out as a critical mechanism for drastically reducing these resource demands. However, its potential remains largely untapped and is not yet fully incorporated in production AI systems. To bridge this gap, this work provides the necessary knowledge and insights for performance engineers keen to get involved in deep learning inference optimization. In particular, in this work we: a) discuss the various forms of sparsity that can be utilized in DNN inference, b) explain how the original dense computations translate to sparse kernels, c) provide an extensive bibliographic review of the state-of-the-art in the implementation of these kernels for CPUs and GPUs, d) discuss the availability of sparse datasets in support of sparsity-related research and development, e) explore the current software tools and frameworks that provide robust sparsity support, and f) present evaluation results of different implementations of the key SpMM and SDDMM kernels on CPU and GPU platforms. Ultimately, this paper aims to serve as a resource for performance engineers seeking to develop and deploy highly efficient sparse deep learning models in productions.
Problem

Research questions and friction points this paper is trying to address.

Optimizing sparse computations for efficient deep learning inference
Bridging the gap between sparsity potential and production AI systems
Providing knowledge for performance engineers on sparse DNN implementation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizing sparsity to reduce DNN inference resource demands
Translating dense computations into efficient sparse kernels
Evaluating sparse kernel implementations on CPU and GPU platforms
🔎 Similar Papers
No similar papers found.
I
Ioanna Tasou
Computing Systems Laboratory (CSLab), National Technical University of Athens, Greece
P
Panagiotis Mpakos
Computing Systems Laboratory (CSLab), National Technical University of Athens, Greece
A
Angelos Vlachos
Artificial Intelligence and Learning Systems Laboratory (AILS), National Technical University of Athens, Greece
D
Dionysios Adamopoulos
Computing Systems Laboratory (CSLab), National Technical University of Athens, Greece
G
Georgios Giannakopoulos
Computing Systems Laboratory (CSLab), National Technical University of Athens, Greece
K
Konstantinos Katsikopoulos
Computing Systems Laboratory (CSLab), National Technical University of Athens, Greece
I
Ioannis Karaparisis
Computing Systems Laboratory (CSLab), National Technical University of Athens, Greece
M
Maria Lazou
Computing Systems Laboratory (CSLab), National Technical University of Athens, Greece
S
Spyridon Loukovitis
Artificial Intelligence and Learning Systems Laboratory (AILS), National Technical University of Athens, Greece
A
Areti Mei
Computing Systems Laboratory (CSLab), National Technical University of Athens, Greece
A
Anastasia Poulopoulou
Computing Systems Laboratory (CSLab), National Technical University of Athens, Greece
Angeliki Dimitriou
Angeliki Dimitriou
PhD Student, National Technical University of Athens
XAIGraph Neural NetworksGenerative Models
Giorgos Filandrianos
Giorgos Filandrianos
Postdoctoral researcher
Explainable AINLP
D
Dimitrios Galanopoulos
Computing Systems Laboratory (CSLab), National Technical University of Athens, Greece
V
Vasileios Karampinis
Artificial Intelligence and Learning Systems Laboratory (AILS), National Technical University of Athens, Greece
I
Ilias Mitsouras
Artificial Intelligence and Learning Systems Laboratory (AILS), National Technical University of Athens, Greece
Nikolaos Spanos
Nikolaos Spanos
PhD Student, National Technical University of Athens
Computer VisionGenerative AIDomain Generalization
Petros Anastasiadis
Petros Anastasiadis
Computing Systems Laboratory (CSLab), National Technical University of Athens, Greece
I
Ioannis Doudalis
Computing Systems Laboratory (CSLab), National Technical University of Athens, Greece
Konstantinos Nikas
Konstantinos Nikas
Computing Systems Laboratory (CSLAB), NTUA
Computer ArchitectureParallel programming modelsHPCCloud Computing
G
G. Retsinas
Intelligent Robotics and Automation Laboratory (IRAL), National Technical University of Athens, Greece
P
Paraskevi Tzouveli
Artificial Intelligence and Learning Systems Laboratory (AILS), National Technical University of Athens, Greece
Christina Giannoula
Christina Giannoula
Postdoctoral Researcher, University of Toronto
Computer ArchitectureComputer SystemsProcessing-In-MemoryMachine LearningHigh Performance
N
N. Koziris
Computing Systems Laboratory (CSLab), National Technical University of Athens, Greece
Nikela Papadopoulou
Nikela Papadopoulou
School of Computing Science, University of Glasgow, United Kingdom
G
G. Stamou
Artificial Intelligence and Learning Systems Laboratory (AILS), National Technical University of Athens, Greece
A
Athanasios Voulodimos
Artificial Intelligence and Learning Systems Laboratory (AILS), National Technical University of Athens, Greece
Georgios Goumas
Georgios Goumas
National Technical University of Athens
Parallel processingHigh Performance ComputingComputing SystemsOperating Systems