SURFing to the Fundamental Limit of Jet Tagging

📅 2025-11-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the fundamental statistical performance limits of jet tagging algorithms in high-energy physics. Method: We propose SURF—the first statistically rigorous framework for validating whether a generative model accurately captures the true data distribution—by integrating a tractable surrogate model (EPiC-FM) with learned likelihood estimation to perform exact Neyman–Pearson hypothesis testing. Contribution/Results: We demonstrate that mainstream autoregressive GPT-style models overestimate jet discrimination capability due to inherent modeling bias, leading to severely inflated estimates of performance ceilings; in contrast, state-of-the-art jet taggers already operate near the true statistical limit. This work establishes the first distributional consistency test for generative models in high-energy physics tasks, providing both a theoretical benchmark and a reliable statistical tool for evaluating jet tagging algorithms.

Technology Category

Application Category

📝 Abstract
Beyond the practical goal of improving search and measurement sensitivity through better jet tagging algorithms, there is a deeper question: what are their upper performance limits? Generative surrogate models with learned likelihood functions offer a new approach to this problem, provided the surrogate correctly captures the underlying data distribution. In this work, we introduce the SUrrogate ReFerence (SURF) method, a new approach to validating generative models. This framework enables exact Neyman-Pearson tests by training the target model on samples from another tractable surrogate, which is itself trained on real data. We argue that the EPiC-FM generative model is a valid surrogate reference for JetClass jets and apply SURF to show that modern jet taggers may already be operating close to the true statistical limit. By contrast, we find that autoregressive GPT models unphysically exaggerate top vs. QCD separation power encoded in the surrogate reference, implying that they are giving a misleading picture of the fundamental limit.
Problem

Research questions and friction points this paper is trying to address.

Determining upper performance limits of jet tagging algorithms
Validating generative models using surrogate reference methods
Assessing if modern jet taggers approach statistical limits
Innovation

Methods, ideas, or system contributions that make the work stand out.

SURF method validates generative models via surrogate reference
EPiC-FM serves as surrogate reference for JetClass jets
Exact Neyman-Pearson tests enabled through tractable surrogate training
🔎 Similar Papers
No similar papers found.
I
Ian Pang
NHETC, Dept. of Physics and Astronomy, Rutgers University, Piscataway, NJ 08854, USA
D
Darius A. Faroughy
NHETC, Dept. of Physics and Astronomy, Rutgers University, Piscataway, NJ 08854, USA
D
David Shih
NHETC, Dept. of Physics and Astronomy, Rutgers University, Piscataway, NJ 08854, USA
R
Ranit Das
NHETC, Dept. of Physics and Astronomy, Rutgers University, Piscataway, NJ 08854, USA; Institute for Theoretical Physics, Universität Heidelberg, Germany
Gregor Kasieczka
Gregor Kasieczka
Universität Hamburg
Particle PhysicsMachine LearningAnomaly Detection