SciDER: Scientific Data-centric End-to-end Researcher

📅 2026-03-01

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Current AI agents struggle to autonomously process raw scientific data, hindering the realization of end-to-end automated research. This work proposes a scientific data-centric multi-agent system that establishes a complete autonomous research loop by collaboratively interpreting raw data, generating data-driven hypotheses and experimental designs, and automatically executing code. The system innovatively integrates a self-evolving memory mechanism and a critic-guided feedback loop, leveraging large language models within a modular Python engineering framework. Evaluated on three scientific benchmarks, it significantly outperforms both general-purpose agents and state-of-the-art models. To facilitate broad adoption in scientific workflows, the authors provide a PyPI package and a lightweight web interface.

Technology Category

Application Category

📝 Abstract

Automated scientific discovery with large language models is transforming the research lifecycle from ideation to experimentation, yet existing agents struggle to autonomously process raw data collected from scientific experiments. We introduce SciDER, a data-centric end-to-end system that automates the research lifecycle. Unlike traditional frameworks, our specialized agents collaboratively parse and analyze raw scientific data, generate hypotheses and experimental designs grounded in specific data characteristics, and write and execute corresponding code. Evaluation on three benchmarks shows SciDER excels in specialized data-driven scientific discovery and outperforms general-purpose agents and state-of-the-art models through its self-evolving memory and critic-led feedback loop. Distributed as a modular Python package, we also provide easy-to-use PyPI packages with a lightweight web interface to accelerate autonomous, data-driven research and aim to be accessible to all researchers and developers.

Problem

Research questions and friction points this paper is trying to address.

scientific discovery

raw data processing

autonomous research

data-centric automation

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

data-centric scientific discovery

autonomous research agents

self-evolving memory