🤖 AI Summary
The LHCb Level-1 trigger imposes stringent requirements on track reconstruction—namely, high throughput, ultra-low latency, and minimal power consumption—necessitating efficient hardware-accelerated inference. Method: This work presents the first systematic comparison of FPGA and GPU implementations for multilayer perceptron (MLP) inference within a real-time high-energy physics trigger context. It employs HLS4ML for accessible FPGA deployment, integrates a graph neural network (GNN)-based preprocessing pipeline, and leverages CUDA-accelerated GPU inference, all benchmarked on authentic LHCb detector data. Contribution/Results: The FPGA implementation achieves throughput comparable to the GPU baseline while reducing latency by 40% and power consumption by 75%. By lowering the FPGA development barrier and demonstrating robust performance under realistic trigger bandwidth constraints, this study validates FPGA-based acceleration as an engineering-viable solution for real-time machine learning triggers in high-energy physics experiments.
📝 Abstract
In high-energy physics, the increasing luminosity and detector granularity at the Large Hadron Collider are driving the need for more efficient data processing solutions. Machine Learning has emerged as a promising tool for reconstructing charged particle tracks, due to its potentially linear computational scaling with detector hits. The recent implementation of a graph neural network-based track reconstruction pipeline in the first level trigger of the LHCb experiment on GPUs serves as a platform for comparative studies between computational architectures in the context of high-energy physics. This paper presents a novel comparison of the throughput of ML model inference between FPGAs and GPUs, focusing on the first step of the track reconstruction pipeline$unicode{x2013}$an implementation of a multilayer perceptron. Using HLS4ML for FPGA deployment, we benchmark its performance against the GPU implementation and demonstrate the potential of FPGAs for high-throughput, low-latency inference without the need for an expertise in FPGA development and while consuming significantly less power.