hls4ml: A Flexible, Open-Source Platform for Deep Learning Acceleration on Reconfigurable Hardware

📅 2025-12-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of low latency, low power consumption, and stringent resource constraints when deploying deep learning models on reconfigurable hardware (e.g., FPGAs and ASICs), this paper proposes an open-source, modular hardware–software co-compilation framework. The framework supports mainstream frontends—including TensorFlow and PyTorch—and is compatible with heterogeneous high-level synthesis (HLS) toolchains such as Xilinx Vitis HLS, Intel oneAPI, and Catapult HLS. It integrates model quantization, structured pruning, and hardware-aware scheduling to enable end-to-end automatic generation of synthesizable HLS code. Compared to manual RTL design, our approach significantly improves deployment efficiency: across diverse scientific and industrial inference workloads, it achieves average reductions of 37% in logic resource utilization, 42% in latency, and 31% in power consumption. Experimental results demonstrate strong cross-platform scalability and practical system applicability.

Technology Category

Application Category

📝 Abstract
We present hls4ml, a free and open-source platform that translates machine learning (ML) models from modern deep learning frameworks into high-level synthesis (HLS) code that can be integrated into full designs for field-programmable gate arrays (FPGAs) or application-specific integrated circuits (ASICs). With its flexible and modular design, hls4ml supports a large number of deep learning frameworks and can target HLS compilers from several vendors, including Vitis HLS, Intel oneAPI and Catapult HLS. Together with a wider eco-system for software-hardware co-design, hls4ml has enabled the acceleration of ML inference in a wide range of commercial and scientific applications where low latency, resource usage, and power consumption are critical. In this paper, we describe the structure and functionality of the hls4ml platform. The overarching design considerations for the generated HLS code are discussed, together with selected performance results.
Problem

Research questions and friction points this paper is trying to address.

Translates ML models into HLS code for FPGA/ASIC integration.
Supports multiple deep learning frameworks and HLS compilers.
Accelerates ML inference with low latency and power efficiency.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Translates ML models into HLS code for FPGAs/ASICs
Supports multiple deep learning frameworks and HLS compilers
Enables low-latency, resource-efficient ML inference acceleration
🔎 Similar Papers
No similar papers found.
Jan-Frederik Schulte
Jan-Frederik Schulte
Purdue University, USA
B
Benjamin Ramhorst
ETH Zurich, Switzerland
C
Chang Sun
California Institute of Technology, USA
J
Jovan Mitrevski
Fermi National Accelerator Lab, USA
Nicolò Ghielmetti
Nicolò Ghielmetti
European Organization for Nuclear Research (CERN), Switzerland
E
Enrico Lupi
European Organization for Nuclear Research (CERN), Switzerland
Dimitrios Danopoulos
Dimitrios Danopoulos
CERN
Machine LearningDeep LearningComputer ArchitectureHardware Acceleration
Vladimir Loncar
Vladimir Loncar
CERN
J
Javier Duarte
University of California San Diego, USA
D
David Burnette
Catapult HLS - Siemens EDA, USA
L
Lauri Laatu
Imperial College London, United Kingdom
S
Stylianos Tzelepis
National Technical University of Athens, Greece
K
Konstantinos Axiotis
University of Geneva, Switzerland
Quentin Berthet
Quentin Berthet
Google DeepMind, Paris
Machine learningStatisticsOptimization
H
Haoyan Wang
Altera Corporation, USA
P
Paul White
Altera Corporation, USA
S
Suleyman Demirsoy
Altera Corporation, USA
M
Marco Colombo
Discovery Partners Institute, USA
T
Thea Aarrestad
ETH Zurich, Switzerland
S
Sioni Summers
European Organization for Nuclear Research (CERN), Switzerland
Maurizio Pierini
Maurizio Pierini
CERN
Particle PhysicsMachine Learning
G
Giuseppe Di Guglielmo
Fermi National Accelerator Lab, USA
Jennifer Ngadiuba
Jennifer Ngadiuba
Wilson Fellow, Fermilab
experimental high-energy physicsdata sciencedeep learningartificial intelligenceFPGAs
Javier Campos
Javier Campos
Fermi National Accelerator Lab, USA
B
Ben Hawks
Fermi National Accelerator Lab, USA