Adapting Vision-Language Models for Neutrino Event Classification in High-Energy Physics

📅 2025-09-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Neutrino event classification in high-energy physics—particularly for pixelated detector imagery—remains challenging due to sparse, low-signal data and limited interpretability of conventional deep learning models. Method: This work introduces, for the first time, vision-language models (VLMs) to this domain. We propose a multimodal architecture built upon fine-tuned LLaMA 3.2 and a vision encoder: detector images are encoded into visual tokens and jointly processed with textual prompts within the VLM, enabling semantic-guided, reasoning-based classification. Contribution/Results: (1) We pioneer the application of VLMs to particle physics image analysis; (2) our approach achieves superior accuracy and robustness over CNN baselines in distinguishing electron- and muon-type neutrino events, while enhancing model interpretability and cross-event-type generalization; (3) we empirically validate that multimodal fusion significantly improves detection of sparse high-energy physics signals, establishing a new physics-informed AI paradigm for future experimental analysis.

Technology Category

Application Category

📝 Abstract
Recent advances in Large Language Models (LLMs) have demonstrated their remarkable capacity to process and reason over structured and unstructured data modalities beyond natural language. In this work, we explore the applications of Vision Language Models (VLMs), specifically a fine-tuned variant of LLaMa 3.2, to the task of identifying neutrino interactions in pixelated detector data from high-energy physics (HEP) experiments. We benchmark this model against a state-of-the-art convolutional neural network (CNN) architecture, similar to those used in the NOvA and DUNE experiments, which have achieved high efficiency and purity in classifying electron and muon neutrino events. Our evaluation considers both the classification performance and interpretability of the model predictions. We find that VLMs can outperform CNNs, while also providing greater flexibility in integrating auxiliary textual or semantic information and offering more interpretable, reasoning-based predictions. This work highlights the potential of VLMs as a general-purpose backbone for physics event classification, due to their high performance, interpretability, and generalizability, which opens new avenues for integrating multimodal reasoning in experimental neutrino physics.
Problem

Research questions and friction points this paper is trying to address.

Adapting vision-language models for neutrino event classification
Benchmarking against convolutional neural networks in physics
Improving classification performance and interpretability in experiments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned LLaMa 3.2 VLM for neutrino classification
Outperforms CNNs in accuracy and interpretability
Integrates multimodal reasoning with auxiliary information
🔎 Similar Papers
No similar papers found.
D
Dikshant Sagar
Department of Computer Science, University of California, Irvine, CA 92697
K
Kaiwen Yu
Department of Computer Science, University of California, Irvine, CA 92697
A
Alejandro Yankelevich
Department of Physics, University of California, Irvine, CA 92697
Jianming Bian
Jianming Bian
University of California, Irvine
Neutrino physicsElectron Collider Physics
Pierre Baldi
Pierre Baldi
Professor, University of California, Irvine
Artificial IntelligenceDeep LearningBioinformaticsPhysicsMathematics