🤖 AI Summary
Particle physics detector data exhibit sparsity and non-Euclidean spatial distributions, posing challenges for scalable, general-purpose AI modeling in high-energy physics.
Method: This work introduces the first scalable foundation model (FM) for nuclear and particle physics, designed specifically for such data. We propose a physics-informed self-supervised pretraining paradigm for collision events, employing a neural-scalable architecture with a frozen backbone and lightweight task-specific adapters to enable efficient cross-task transfer.
Contribution/Results: Trained and validated on a large-scale dataset comprising over 11 million high-energy collision events, the FM consistently outperforms baseline models across all downstream tasks. It demonstrates exceptional data efficiency and strong cross-task generalization capability—establishing the first scalable foundation model paradigm for high-energy physics AI.
📝 Abstract
Large language models have revolutionized artificial intelligence by enabling large, generalizable models trained through self-supervision. This paradigm has inspired the development of scientific foundation models (FMs). However, applying this capability to experimental particle physics is challenging due to the sparse, spatially distributed nature of detector data, which differs dramatically from natural language. This work addresses if an FM for particle physics can scale and generalize across diverse tasks. We introduce a new dataset with more than 11 million particle collision events and a suite of downstream tasks and labeled data for evaluation. We propose a novel self-supervised training method for detector data and demonstrate its neural scalability with models that feature up to 188 million parameters. With frozen weights and task-specific adapters, this FM consistently outperforms baseline models across all downstream tasks. The performance also exhibits robust data-efficient adaptation. Further analysis reveals that the representations extracted by the FM are task-agnostic but can be specialized via a single linear mapping for different downstream tasks.