๐ค AI Summary
This work addresses the limitations of existing automated scientific research systems, which struggle to continuously comprehend research domains, structurally identify knowledge gaps, and lack mechanisms for collaborative validation among agents. To overcome these challenges, we propose AutoProf, a multi-agent framework that constructs a persistent knowledge graph as a world model of the research domain, enabling end-to-end autonomous researchโfrom literature review and gap identification to method development and paper writing. Key innovations include structured, module-level gap discovery; a self-improving loop featuring self-correction and cross-domain mechanism search; and a consensus-based validation protocol for novel findings. By leveraging the knowledge graph as shared memory and integrating multi-agent collaboration, bias detection, and state-of-the-art large language models, AutoProf achieves a model-agnostic, scalable architecture capable of continuous self-optimization across a spectrum from lightweight exploration to full-scale research campaigns.
๐ Abstract
Existing automated research systems operate as stateless, linear pipelines, generating outputs without maintaining a persistent understanding of the research landscape. They process papers sequentially, propose ideas without structured gap analysis, and lack mechanisms for agents to verify or refine each other's findings. We present AutoProf (Autonomous Professor), a multi-agent orchestration framework where specialized agents provide end-to-end AI research supervision driven by human interests, from literature review through gap discovery, method development, evaluation, and paper writing, via autonomous exploration and self-correcting updates. Unlike sequential pipelines, AutoProf maintains a continuously evolving Research World Model implemented as a Knowledge Graph, capturing methods, benchmarks, limitations, and unexplored gaps as shared memory across agents. The framework introduces three contributions: first, structured gap discovery that decomposes methods into modules, evaluates them across benchmarks, and identifies module-level gaps; second, self-correcting discovery loops that analyze why modules succeed or fail, detect benchmark biases, and assess evaluation adequacy; third, self-improving development loops using cross-domain mechanism search to iteratively address failing components. All agents operate under a consensus mechanism where findings are validated before being committed to the shared model. The framework is model-agnostic, supports mainstream large language models, and scales elastically with token budget from lightweight exploration to full-scale investigation.