SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing

📅 2025-03-06

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Current large language models (LLMs) face two key bottlenecks in automatically generating scientific surveys: weak logical coherence in outlines and low citation accuracy. To address these, we propose a two-stage generation framework. In Stage I, we construct high-quality outlines via logical structure analysis and scholar-guided memory retrieval, introducing the first outline-aware heuristic modeling approach. In Stage II, we integrate retrieval-augmented generation (RAG) with structured outline parsing for content synthesis. Our contributions are threefold: (1) a memory-driven content generation paradigm; (2) SurveyBench—the first comprehensive, three-dimensional evaluation benchmark for scientific surveys, measuring citation accuracy, outline logicality, and content coherence; and (3) consistent and significant improvements over baselines (e.g., AutoSurvey) across all three metrics on SurveyBench.

Technology Category

Application Category

📝 Abstract

Survey paper plays a crucial role in scientific research, especially given the rapid growth of research publications. Recently, researchers have begun using LLMs to automate survey generation for better efficiency. However, the quality gap between LLM-generated surveys and those written by human remains significant, particularly in terms of outline quality and citation accuracy. To close these gaps, we introduce SurveyForge, which first generates the outline by analyzing the logical structure of human-written outlines and referring to the retrieved domain-related articles. Subsequently, leveraging high-quality papers retrieved from memory by our scholar navigation agent, SurveyForge can automatically generate and refine the content of the generated article. Moreover, to achieve a comprehensive evaluation, we construct SurveyBench, which includes 100 human-written survey papers for win-rate comparison and assesses AI-generated survey papers across three dimensions: reference, outline, and content quality. Experiments demonstrate that SurveyForge can outperform previous works such as AutoSurvey.

Problem

Research questions and friction points this paper is trying to address.

Improves quality of LLM-generated survey papers

Enhances outline quality and citation accuracy

Provides multi-dimensional evaluation of survey papers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates outlines using logical structure analysis

Refines content with high-quality retrieved papers

Evaluates surveys across reference, outline, content

🔎 Similar Papers

No similar papers found.