🤖 AI Summary
Modern physics research faces increasingly complex, labor-intensive data analysis workflows that heavily rely on manual intervention. Method: This paper proposes a large language model (LLM)-based multi-agent collaboration framework that emulates human researchers’ iterative analytical reasoning. Leveraging advanced LLMs—including GPT-4o, GPT-4.1, and GPT-5—and the LHC Olympics dataset, the system autonomously generates code, invokes standard scientific tools and machine learning libraries, performs multi-turn reasoning, and validates results. Contribution/Results: Unlike conventional domain-specific algorithms, the framework requires no handcrafted analysis logic and enables end-to-end automation of scientific data analysis tasks. In anomaly detection, it achieves performance comparable to human experts, demonstrating the feasibility, robustness, and paradigm-shifting potential of LLM-driven multi-agent systems for complex scientific data analysis.
📝 Abstract
The substantial data volumes encountered in modern particle physics and other domains of fundamental physics research allow (and require) the use of increasingly complex data analysis tools and workflows. While the use of machine learning (ML) tools for data analysis has recently proliferated, these tools are typically special-purpose algorithms that rely, for example, on encoded physics knowledge to reach optimal performance. In this work, we investigate a new and orthogonal direction: Using recent progress in large language models (LLMs) to create a team of agents -- instances of LLMs with specific subtasks -- that jointly solve data analysis-based research problems in a way similar to how a human researcher might: by creating code to operate standard tools and libraries (including ML systems) and by building on results of previous iterations. If successful, such agent-based systems could be deployed to automate routine analysis components to counteract the increasing complexity of modern tool chains. To investigate the capabilities of current-generation commercial LLMs, we consider the task of anomaly detection via the publicly available and highly-studied LHC Olympics dataset. Several current models by OpenAI (GPT-4o, o4-mini, GPT-4.1, and GPT-5) are investigated and their stability tested. Overall, we observe the capacity of the agent-based system to solve this data analysis problem. The best agent-created solutions mirror the performance of human state-of-the-art results.