🤖 AI Summary
To address the latency bottleneck of existing super-resolution methods in cloud gaming—where millisecond-level end-to-end delay constraints render conventional approaches too slow for real-time high-definition video streaming—this paper proposes River, a lightweight, online-adaptive super-resolution framework. River leverages the spatiotemporal redundancy inherent in game videos to design a content-aware encoder and a dynamic model lookup-and-reuse mechanism, enabling millisecond-scale online scheduling and incremental model updates. It further incorporates client-side weight prefetching to eliminate network transmission bottlenecks. Compared to state-of-the-art methods, River reduces redundant training overhead by 44%, improves PSNR by 1.81 dB, and achieves stable real-time neural enhancement at 720p resolution and 20 fps on mobile devices.
📝 Abstract
Online Cloud gaming demands real-time, high-quality video transmission across variable wide-area networks (WANs). Neural-enhanced video transmission algorithms employing super-resolution (SR) for video quality enhancement have effectively challenged WAN environments. However, these SR-based methods require intensive fine-tuning for the whole video, making it infeasible in diverse online cloud gaming. To address this, we introduce River, a cloud gaming delivery framework designed based on the observation that video segment features in cloud gaming are typically repetitive and redundant. This permits a significant opportunity to reuse fine-tuned SR models, reducing the fine-tuning latency of minutes to query latency of milliseconds. To enable the idea, we design a practical system that addresses several challenges, such as model organization, online model scheduler, and transfer strategy. River first builds a content-aware encoder that fine-tunes SR models for diverse video segments and stores them in a lookup table. When delivering cloud gaming video streams online, River checks the video features and retrieves the most relevant SR models to enhance the frame quality. Meanwhile, if no existing SR model performs well enough for some video segments, River will further fine-tune new models and update the lookup table. Finally, to avoid the overhead of streaming model weight to the clients, River designs a prefetching strategy that predicts the models with the highest possibility of being retrieved. Our evaluation based on real video game streaming demonstrates River can reduce redundant training overhead by 44% and improve the Peak-Signal-to-Noise-Ratio by 1.81dB compared to the SOTA solutions. Practical deployment shows River meets real-time requirements, achieving approximately 720p 20fps on mobile devices.