🤖 AI Summary
The growing volume of scientific data has widened the gap between analytical capabilities and research intent: existing AI tools—such as AutoML systems or agent-based assistants—either sacrifice transparency for automation or rely on inefficient manual scripting, failing to simultaneously ensure interpretability, reproducibility, and scalability. To address this, we propose ARIA, the first natural language specification-driven scientific data analysis framework. ARIA employs a six-layer co-designed architecture (Command–Context–Code–Data–Orchestration–AI) to enable human-AI collaboration, automated code generation, computational validation, and full auditability within a unified document-centric workflow. Integrating NLP, AutoML, workflow orchestration, and eXplainable AI (XAI), ARIA significantly reduces overfitting, precisely identifies salient features, and selects optimal models across diverse benchmarks—including Boston Housing—thereby substantially improving both analytical efficiency and reproducibility.
📝 Abstract
The rapid expansion of scientific data has widened the gap between analytical capability and research intent. Existing AI-based analysis tools, ranging from AutoML frameworks to agentic research assistants, either favor automation over transparency or depend on manual scripting that hinders scalability and reproducibility. We present ARIA (Automated Research Intelligence Assistant), a spec-driven, human-in-the-loop framework for automated and interpretable data analysis. ARIA integrates six interoperable layers, namely Command, Context, Code, Data, Orchestration, and AI Module, within a document-centric workflow that unifies human reasoning and machine execution. Through natural-language specifications, researchers define analytical goals while ARIA autonomously generates executable code, validates computations, and produces transparent documentation. Beyond achieving high predictive accuracy, ARIA can rapidly identify optimal feature sets and select suitable models, minimizing redundant tuning and repetitive experimentation. In the Boston Housing case, ARIA discovered 25 key features and determined XGBoost as the best performing model (R square = 0.93) with minimal overfitting. Evaluations across heterogeneous domains demonstrate ARIA's strong performance, interpretability, and efficiency compared with state-of-the-art systems. By combining AI for research and AI for science principles within a spec-driven architecture, ARIA establishes a new paradigm for transparent, collaborative, and reproducible scientific discovery.