🤖 AI Summary
Generating publication-ready academic figures remains a key bottleneck in AI-driven research workflows. This work proposes an end-to-end, multi-agent collaborative framework that unifies the automatic generation of methodological diagrams and statistical charts for the first time. By integrating reference retrieval, content and style planning, image rendering, and self-critique–based iterative refinement, the framework leverages state-of-the-art vision-language and image generation models to produce high-quality figures that excel in fidelity, conciseness, readability, and visual appeal. Evaluated on PaperBananaBench—a benchmark comprising 292 NeurIPS 2025 method figures—the proposed approach significantly outperforms existing baselines, achieving leading performance across multiple quantitative and qualitative metrics.
📝 Abstract
Despite rapid advances in autonomous AI scientists powered by language models, generating publication-ready illustrations remains a labor-intensive bottleneck in the research workflow. To lift this burden, we introduce PaperBanana, an agentic framework for automated generation of publication-ready academic illustrations. Powered by state-of-the-art VLMs and image generation models, PaperBanana orchestrates specialized agents to retrieve references, plan content and style, render images, and iteratively refine via self-critique. To rigorously evaluate our framework, we introduce PaperBananaBench, comprising 292 test cases for methodology diagrams curated from NeurIPS 2025 publications, covering diverse research domains and illustration styles. Comprehensive experiments demonstrate that PaperBanana consistently outperforms leading baselines in faithfulness, conciseness, readability, and aesthetics. We further show that our method effectively extends to the generation of high-quality statistical plots. Collectively, PaperBanana paves the way for the automated generation of publication-ready illustrations.