Stealing Creator's Workflow: A Creator-Inspired Agentic Framework with Iterative Feedback Loop for Improved Scientific Short-form Generation

πŸ“… 2025-04-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the dual challenges of factual inaccuracy and weak visual expressiveness in scientific short-video generation, this paper proposes SciTalkβ€”a multi-LLM collaborative agent framework. SciTalk emulates human creator workflows by orchestrating specialized agents: a summarization agent, a visual planning agent, and a text/layout editing agent, which jointly ground content through cross-modal knowledge integration (e.g., paper text, figures, visual styles, and virtual avatars). Crucially, it introduces a user-role-simulation-based iterative feedback mechanism that dynamically refines generation prompts to bridge the semantic gap between expert knowledge and public comprehension. Experiments demonstrate that SciTalk significantly outperforms baseline methods in both factual accuracy and audience engagement; moreover, multi-round iteration consistently improves performance across both dimensions. This work establishes a reproducible, scalable paradigm for AI-augmented science communication.

Technology Category

Application Category

πŸ“ Abstract
Generating engaging, accurate short-form videos from scientific papers is challenging due to content complexity and the gap between expert authors and readers. Existing end-to-end methods often suffer from factual inaccuracies and visual artifacts, limiting their utility for scientific dissemination. To address these issues, we propose SciTalk, a novel multi-LLM agentic framework, grounding videos in various sources, such as text, figures, visual styles, and avatars. Inspired by content creators' workflows, SciTalk uses specialized agents for content summarization, visual scene planning, and text and layout editing, and incorporates an iterative feedback mechanism where video agents simulate user roles to give feedback on generated videos from previous iterations and refine generation prompts. Experimental evaluations show that SciTalk outperforms simple prompting methods in generating scientifically accurate and engaging content over the refined loop of video generation. Although preliminary results are still not yet matching human creators' quality, our framework provides valuable insights into the challenges and benefits of feedback-driven video generation. Our code, data, and generated videos will be publicly available.
Problem

Research questions and friction points this paper is trying to address.

Generating accurate short-form videos from complex scientific papers
Overcoming factual inaccuracies in existing end-to-end video generation methods
Bridging the gap between expert authors and general audience comprehension
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-LLM agentic framework for video generation
Iterative feedback loop refining prompts
Specialized agents for content and visuals