Sub-microsecond Transformers for Jet Tagging on FPGAs

📅 2025-10-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Transformer models for LHC jet tagging in high-energy physics suffer from prohibitively high computational complexity, rendering them unsuitable for hardware-trigger systems requiring sub-microsecond real-time inference. Method: This work introduces the first integration of multi-head and linear attention mechanisms into the hls4ml toolchain, enabling end-to-end Transformer deployment on a single FPGA. We propose a synergistic optimization combining fine-grained quantization and distributed arithmetic, co-designed with custom hardware architecture. Contribution/Results: The implementation achieves an inference latency of ~100 ns—improving upon state-of-the-art FPGA-based accelerators by one to two orders of magnitude—while significantly reducing resource utilization. This represents the first practical, hardware-efficient Transformer implementation validated for next-generation, high-luminosity LHC real-time trigger systems.

Technology Category

Application Category

📝 Abstract
We present the first sub-microsecond transformer implementation on an FPGA achieving competitive performance for state-of-the-art high-energy physics benchmarks. Transformers have shown exceptional performance on multiple tasks in modern machine learning applications, including jet tagging at the CERN Large Hadron Collider (LHC). However, their computational complexity prohibits use in real-time applications, such as the hardware trigger system of the collider experiments up until now. In this work, we demonstrate the first application of transformers for jet tagging on FPGAs, achieving $mathcal{O}(100)$ nanosecond latency with superior performance compared to alternative baseline models. We leverage high-granularity quantization and distributed arithmetic optimization to fit the entire transformer model on a single FPGA, achieving the required throughput and latency. Furthermore, we add multi-head attention and linear attention support to hls4ml, making our work accessible to the broader fast machine learning community. This work advances the next-generation trigger systems for the High Luminosity LHC, enabling the use of transformers for real-time applications in high-energy physics and beyond.
Problem

Research questions and friction points this paper is trying to address.

Overcoming transformer computational complexity for real-time physics applications
Enabling sub-microsecond jet tagging on FPGA trigger systems
Implementing quantized transformers for high-energy physics experiments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sub-microsecond transformer implementation on FPGA
High-granularity quantization and distributed arithmetic optimization
Multi-head attention and linear attention support in hls4ml
🔎 Similar Papers
No similar papers found.
L
Lauri Laatu
Imperial College London, United Kingdom
C
Chang Sun
California Institute of Technology, USA
A
Arianna Cox
Imperial College London, United Kingdom
A
Abhijith Gandrakota
Fermilab, USA
B
Benedikt Maier
Imperial College London, United Kingdom
Jennifer Ngadiuba
Jennifer Ngadiuba
Wilson Fellow, Fermilab
experimental high-energy physicsdata sciencedeep learningartificial intelligenceFPGAs
Z
Zhiqiang Que
Imperial College London, United Kingdom
Wayne Luk
Wayne Luk
Professor of Computer Engineering, Imperial College London
Hardware and ArchitectutreReconfigurable ComputingDesign Automation
M
Maria Spiropulu
California Institute of Technology, USA
A
Alexander Tapper
Imperial College London, United Kingdom