🤖 AI Summary
This work proposes LingBot-World, the first open-source world simulator that integrates high-fidelity video generation, minute-scale temporal consistency, and sub-second interactive latency within a unified framework. Existing open-source world models significantly lag behind their closed-source counterparts in environmental diversity, long-term temporal coherence, and real-time interactivity. By introducing a novel temporal consistency control mechanism and a low-latency inference architecture, LingBot-World enables real-time generation at 16 frames per second with second-level response times across diverse visual styles—including realistic, scientific, and cartoon-like scenes. The release of its code and models substantially narrows the performance gap between open-source and proprietary world modeling systems.
📝 Abstract
We present LingBot-World, an open-sourced world simulator stemming from video generation. Positioned as a top-tier world model, LingBot-World offers the following features. (1) It maintains high fidelity and robust dynamics in a broad spectrum of environments, including realism, scientific contexts, cartoon styles, and beyond. (2) It enables a minute-level horizon while preserving contextual consistency over time, which is also known as"long-term memory". (3) It supports real-time interactivity, achieving a latency of under 1 second when producing 16 frames per second. We provide public access to the code and model in an effort to narrow the divide between open-source and closed-source technologies. We believe our release will empower the community with practical applications across areas like content creation, gaming, and robot learning.