InferA: A Smart Assistant for Cosmological Ensemble Data

📅 2025-10-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Analyzing terabyte-scale cosmological simulation data is inefficient due to its massive volume, complex structure, and high domain-specific expertise requirements. Method: This paper proposes a large language model (LLM)-based multi-agent collaborative analysis framework. It adopts a hierarchical supervisor–agent architecture that enables natural-language-driven user intent understanding and query validation, while leveraging data chunking, context-aware reasoning, and domain-knowledge injection to avoid full-data ingestion. Contribution/Results: The framework uniquely couples multi-agent coordination with intrinsic scientific data characteristics, supporting interactive, scalable, and lightweight analysis. Experiments on the HACC cosmological simulation dataset demonstrate accurate intent parsing and sub-second response times, significantly enhancing exploratory efficiency and usability of large-scale scientific data.

Technology Category

Application Category

📝 Abstract
Analyzing large-scale scientific datasets presents substantial challenges due to their sheer volume, structural complexity, and the need for specialized domain knowledge. Automation tools, such as PandasAI, typically require full data ingestion and lack context of the full data structure, making them impractical as intelligent data analysis assistants for datasets at the terabyte scale. To overcome these limitations, we propose InferA, a multi-agent system that leverages large language models to enable scalable and efficient scientific data analysis. At the core of the architecture is a supervisor agent that orchestrates a team of specialized agents responsible for distinct phases of the data retrieval and analysis. The system engages interactively with users to elicit their analytical intent and confirm query objectives, ensuring alignment between user goals and system actions. To demonstrate the framework's usability, we evaluate the system using ensemble runs from the HACC cosmology simulation which comprises several terabytes.
Problem

Research questions and friction points this paper is trying to address.

Analyzing large-scale cosmological datasets with automation tools
Overcoming data volume and complexity limitations in scientific analysis
Providing interactive intent-based analysis for terabyte-scale simulations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent system with LLM for scalable analysis
Supervisor agent orchestrates specialized analysis phases
Interactive user intent elicitation for query alignment
🔎 Similar Papers
No similar papers found.
J
Justin Z. Tam
Los Alamos National Laboratory, Los Alamos, United States
Pascal Grosset
Pascal Grosset
Los Alamos National Lab
VisualizationGraphicsData AnalysisAI / MLCompression
D
Divya Banesh
Los Alamos National Laboratory, Los Alamos, United States
Nesar Ramachandra
Nesar Ramachandra
Computational Scientist, Argonne National Laboratory
CosmologyMachine Learning
T
Terece L. Turton
Los Alamos National Laboratory, Los Alamos, United States
James Ahrens
James Ahrens
Los Alamos National Laboratory
Scientific VisualizationLarge DataData AnalysisData Science