🤖 AI Summary
This work proposes FC-r, a lightweight adaptive video inference framework based on fuzzy control, to address the common oversight in existing methods that fail to balance resource efficiency and inference performance—often resulting in either excessive computational overhead or compromised accuracy. By modeling spatiotemporal correlations between consecutive frames and integrating real-time device resource conditions, FC-r dynamically schedules multi-scale models during inference. This approach overcomes the longstanding trade-off between accuracy and resource consumption, enabling resource-aware, efficient video reasoning. Experimental results demonstrate that the proposed framework significantly improves resource utilization while maintaining high accuracy, effectively achieving an optimal balance between computational cost and inference performance.
📝 Abstract
Existing video inference (VI) enhancement methods typically aim to improve performance by scaling up model sizes and employing sophisticated network architectures. While these approaches demonstrated state-of-the-art performance, they often overlooked the trade-off of resource efficiency and inference effectiveness, leading to inefficient resource utilization and suboptimal inference performance. To address this problem, a fuzzy controller (FC-r) is developed based on key system parameters and inference-related metrics. Guided by the FC-r, a VI enhancement framework is proposed, where the spatiotemporal correlation of targets across adjacent video frames is leveraged. Given the real-time resource conditions of the target device, the framework can dynamically switch between models of varying scales during VI. Experimental results demonstrate that the proposed method effectively achieves a balance between resource utilization and inference performance.