🤖 AI Summary
In high-energy nuclear physics, final-state data from heavy-ion collisions (HIC) exhibit high dimensionality and complex structure, rendering conventional approaches—relying on hand-crafted observables—prone to missing nonlinear physical correlations. To address this, we propose a masked point cloud Transformer autoencoder framework trained in two stages: first, self-supervised pretraining to learn compact, information-rich latent representations; second, supervised fine-tuning for downstream tasks. Our method integrates point cloud modeling, self-supervised learning, and interpretability analysis (via SHAP and PCA), enabling both robust discrimination and physical insight. It achieves state-of-the-art performance on collision system size classification, significantly outperforming the PointNet baseline. The learned features demonstrate strong discriminative power while retaining physical interpretability—e.g., aligning with known collective flow patterns and centrality-dependent trends. All code is publicly available.
📝 Abstract
A central challenge in high-energy nuclear physics is to extract informative features from the high-dimensional final-state data of heavy-ion collisions (HIC) in order to enable reliable downstream analyses. Traditional approaches often rely on selected observables, which may miss subtle but physically relevant structures in the data. To address this, we introduce a Transformer-based autoencoder trained with a two-stage paradigm: self-supervised pre-training followed by supervised fine-tuning. The pretrained encoder learns latent representations directly from unlabeled HIC data, providing a compact and information-rich feature space that can be adapted to diverse physics tasks. As a case study, we apply the method to distinguish between large and small collision systems, where it achieves significantly higher classification accuracy than PointNet. Principal component analysis and SHAP interpretation further demonstrate that the autoencoder captures complex nonlinear correlations beyond individual observables, yielding features with strong discriminative and explanatory power. These results establish our two-stage framework as a general and robust foundation for feature learning in HIC, opening the door to more powerful analyses of quark--gluon plasma properties and other emergent phenomena. The implementation is publicly available at https://github.com/Giovanni-Sforza/MaskPoint-AMPT.