Vision Transformers for End-to-End Quark-Gluon Jet Classification from Calorimeter Images

📅 2025-06-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical challenge of quark/gluon jet discrimination in high-energy physics by pioneering a systematic investigation into the end-to-end classification capability of vision Transformers (ViTs) directly applied to multi-channel calorimeter images—comprising electromagnetic (ECAL), hadronic (HCAL), and tracking layers. We propose two novel architectures: pure ViT variants and hybrid ViT-CNN models (ViT+MaxViT, ViT+ConvNeXt), explicitly capturing long-range substructure correlations within jets. Evaluated on the publicly available CMS 2012 simulated dataset—incorporating realistic detector response and pile-up noise—our models achieve significant improvements over CNN baselines in F1-score, ROC-AUC, and accuracy. This study establishes the first ViT-based jet classification benchmark grounded in open collider data and releases a structured, multi-channel jet image dataset. It thus introduces a new paradigm for deep learning in high-energy physics, enabling more effective modeling of complex, long-range jet topologies.

Technology Category

Application Category

📝 Abstract
Distinguishing between quark- and gluon-initiated jets is a critical and challenging task in high-energy physics, pivotal for improving new physics searches and precision measurements at the Large Hadron Collider. While deep learning, particularly Convolutional Neural Networks (CNNs), has advanced jet tagging using image-based representations, the potential of Vision Transformer (ViT) architectures, renowned for modeling global contextual information, remains largely underexplored for direct calorimeter image analysis, especially under realistic detector and pileup conditions. This paper presents a systematic evaluation of ViTs and ViT-CNN hybrid models for quark-gluon jet classification using simulated 2012 CMS Open Data. We construct multi-channel jet-view images from detector-level energy deposits (ECAL, HCAL) and reconstructed tracks, enabling an end-to-end learning approach. Our comprehensive benchmarking demonstrates that ViT-based models, notably ViT+MaxViT and ViT+ConvNeXt hybrids, consistently outperform established CNN baselines in F1-score, ROC-AUC, and accuracy, highlighting the advantage of capturing long-range spatial correlations within jet substructure. This work establishes the first systematic framework and robust performance baselines for applying ViT architectures to calorimeter image-based jet classification using public collider data, alongside a structured dataset suitable for further deep learning research in this domain.
Problem

Research questions and friction points this paper is trying to address.

Distinguishing quark- and gluon-initiated jets in high-energy physics
Exploring Vision Transformers for jet classification from calorimeter images
Benchmarking ViT models against CNNs for improved jet tagging performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision Transformers analyze calorimeter images directly
Hybrid ViT-CNN models enhance jet classification
End-to-end learning with multi-channel jet images
🔎 Similar Papers
No similar papers found.