🤖 AI Summary
This work addresses the high barrier to creating interactive educational documents and the limited controllability and verifiability of outputs generated by existing large language model (LLM) agents. To overcome these challenges, the authors propose ViviDoc, a novel system that employs a multi-agent collaboration framework—comprising a planner, executor, and evaluator—and introduces DocSpec, an intermediate representation that decouples interactive logic into four orthogonal components: state, rendering, transitions, and constraints. This design enables users to inspect and refine content before code generation, enhancing both transparency and user control. Experimental results demonstrate that ViviDoc substantially outperforms conventional approaches, achieving significant improvements in output quality, controllability, and the overall editing experience.
📝 Abstract
Interactive articles help readers engage with complex ideas through exploration, yet creating them remains costly, requiring both domain expertise and web development skills. Recent LLM-based agents can automate content creation, but naively applying them yields uncontrollable and unverifiable outputs. We present ViviDoc, a human-agent collaborative system that generates interactive educational documents from a single topic input. ViviDoc introduces a multi-agent pipeline (Planner, Executor, Evaluator) and the Document Specification (DocSpec), a human-readable intermediate representation that decomposes each interactive visualization into State, Render, Transition, and Constraint components. The DocSpec enables educators to review and refine generation plans before code is produced, bridging the gap between pedagogical intent and executable output. Expert evaluation and a user study show that ViviDoc substantially outperforms naive agentic generation and provides an intuitive editing experience. Our project homepage is available at https://vividoc-homepage.vercel.app/.