🤖 AI Summary
This work addresses the hallucination issues prevalent in large language models (LLMs) when generating algorithm visualization videos in an end-to-end manner, which often manifest as execution errors, visual element overlap, and inter-frame inconsistency. To mitigate these problems, the authors propose a novel decoupled paradigm that separates execution from rendering: first, an LLM generates a verifiable execution trace conforming to Visualization Trajectory Algebra (VTA), and then a deterministic renderer—supporting backends such as Manim, LaTeX/TikZ, and Three.js—produces high-quality visualizations guided by Rendering Style Language (RSL). This approach, which introduces VTA and RSL for the first time, explicitly disentangles algorithmic logic from visual presentation. Evaluated on a benchmark of 200 LeetCode tasks, the method achieves a 99.8% success rate, outperforming end-to-end baselines by 17.3% and substantially reducing hallucinations while improving visual consistency.
📝 Abstract
Algorithm Visualization (AV) helps students build mental models by animating algorithm execution states. Recent LLM-based systems such as CODE2VIDEO generate AV videos in an end-to-end manner. However, this paradigm requires the system to simultaneously simulate algorithm flow and satisfy video rendering constraints, such as element layout and color schemes. This complex task induces LLM hallucinations, resulting in reduced execution success rates, element overlap, and inter-frame inconsistencies.
To address these challenges, we propose ALGOGEN, a novel paradigm that decouples algorithm execution from rendering. We first introduce Visualization Trace Algebra (VTA), a monoid over algorithm visual states and operations. The LLM then generates a Python tracker that simulates algorithm flow and outputs VTA-JSON traces, a JSON encoding of VTA. For rendering, we define a Rendering Style Language (RSL) to templatize algorithm layouts. A deterministic renderer then compiles algorithm traces with RSL into Manim, LaTeX/TikZ, or Three.js outputs.
Evaluated on a LeetCode AV benchmark of 200 tasks, ALGOGEN achieves an average success rate improvement of 17.3% compared to end-to-end methods, with 99.8% versus 82.5%. These results demonstrate that our decoupling paradigm effectively mitigates LLM hallucinations in complex AV tasks, providing a more reliable solution for automated generation of high-quality algorithm visualizations. Demo videos and code are available in the project repository.