ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration

πŸ“… 2026-05-04
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

226K/year
πŸ€– AI Summary
This work addresses the tendency of large language model–driven autonomous research systems to produce plausible yet unsupported conclusions over extended tasks. To mitigate this, the authors propose an adversarial multi-agent collaboration framework in which an executor advances the research while heterogeneous reviewers continuously scrutinize intermediate outputs and enforce revisions, ensuring rigor and verifiability. Innovatively adopting adversarial multi-model collaboration as the default mechanism, the system integrates a three-tier architecture, over 65 Markdown-defined skills, MCP model ensembles, a persistent research wiki, deterministic figure generation, and a three-stage evidence validation protocol to establish an end-to-end traceable and self-optimizing scientific workflow. The framework enables automatic verification of experimental claims, validation of mathematical proofs, and PDF-based visual auditing, culminating in a self-improving loop refined through a five-stage scientific editing pipeline with editorial approval.
πŸ“ Abstract
This report describes ARIS (Auto-Research-in-sleep), an open-source research harness for autonomous research, including its architecture, assurance mechanisms, and early deployment experience. The performance of agent systems built on LLMs depends on both the model weights and the harness around them, which governs what information to store, retrieve, and present to the model. For long-horizon research workflows, the central failure mode is not a visible breakdown but a plausible unsupported success: a long-running agent can produce claims whose evidential support is incomplete, misreported, or silently inherited from the executor's framing. Therefore, we present ARIS as a research harness that coordinates machine-learning research workflows through cross-model adversarial collaboration as a default configuration: an executor model drives forward progress while a reviewer from a different model family is recommended to critique intermediate artifacts and request revisions. ARIS has three architectural layers. The execution layer provides more than 65 reusable Markdown-defined skills, model integrations via MCP, a persistent research wiki for iterative reuse of prior findings, and deterministic figure generation. The orchestration layer coordinates five end-to-end workflows with adjustable effort settings and configurable routing to reviewer models. The assurance layer includes a three-stage process for checking whether experimental claims are supported by evidence: integrity verification, result-to-claim mapping, and claim auditing that cross-checks manuscript statements against the claim ledger and raw evidence, as well as a five-pass scientific-editing pipeline, mathematical-proof checks, and visual inspection of the rendered PDF. A prototype self-improvement loop records research traces and proposes harness improvements that are adopted only after reviewer approval.
Problem

Research questions and friction points this paper is trying to address.

autonomous research
evidence support
plausible unsupported success
long-horizon workflows
research integrity
Innovation

Methods, ideas, or system contributions that make the work stand out.

adversarial multi-agent collaboration
autonomous research
evidence-supported claims
research harness architecture
LLM-based scientific workflow
πŸ”Ž Similar Papers
No similar papers found.