DeepSurvey: Enhancing Analytical Depth and Citation Reliability in Automated Survey Generation

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Existing automated survey systems often produce shallow analyses and unreliable citations due to their reliance on abstracts and isolated paper processing. This work proposes a multi-agent framework that, for the first time, integrates fine-grained full-text parsing, cross-paper comparative analysis, and code repository insights to enrich content generation. Citation accuracy is enhanced through an evidence-constrained citation assignment mechanism combining citation graph expansion, hybrid retrieval filtering, and explicit alignment between claims and supporting references via multi-granularity agent verification. Experimental results demonstrate that the generated surveys achieve a content quality score of 8.64/10, with citation recall and precision improved by 12.3% and 9.3%, respectively. Notably, 83.3% of expert reviewers judged the system’s output to surpass human-written surveys in quality, and it significantly outperforms baseline methods across diverse domains.

📝 Abstract

As scientific literature grows rapidly, automated survey generation has become a key capability for AI scientists and human researchers. However, existing systems suffer from limited analytical depth due to reliance on abstracts and isolated paper processing, and unreliable citations from imprecise retrieval and post-hoc grounding, producing superficial surveys and may mislead researchers. We present DeepSurvey, an agentic system that addresses both. To enhance depth, DeepSurvey extracts structured keynotes from full-text papers, models cross-paper relationships through clustering and comparative analysis, and integrates code-repository analysis to recover implementation-level details. To fortify reliability, it combines citation-graph expansion with hybrid filtering for topic-focussed retrieval, enforces evidence-constrained citation assignment, and deploys multi-granularity agentic refinement to validate citation-claim alignment. Experiments show that DeepSurvey achieves the highest content score (8.644/10) and citation quality (12.3% and 9.3% recall and precision gains over the strongest baseline), generalizes more robustly across domains (0.14 vs 0.22 to 0.69 CS-to-non-CS drop), and is preferred over human-written surveys by domain experts (83.3% overall quality, 100% content depth).

Problem

Research questions and friction points this paper is trying to address.

automated survey generation

analytical depth

citation reliability

scientific literature

survey generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

automated survey generation

analytical depth

citation reliability