🤖 AI Summary
Current video analysis and multimodal large models struggle to simultaneously satisfy the three criteria—fast, frequent, and fine-grained (F³)—in event recognition, primarily due to motion blur and subtle visual distinctions. To address this, we introduce F³Set, the first dedicated benchmark for F³ event understanding, comprising over one thousand event categories, precise temporal annotations, and multi-level semantic granularity. We propose a novel formal definition of F³ events and a corresponding evaluation paradigm. Furthermore, we design F³ED, a specialized detection model integrating high-frame-rate modeling, temporally precise localization, multi-scale feature alignment, and fine-grained classification modules. Extensive experiments on multiple sports video datasets demonstrate that F³ED achieves an average 12.7% improvement in mean Average Precision (mAP) over state-of-the-art methods. All code, models, and the F³Set benchmark are publicly released.
📝 Abstract
Analyzing Fast, Frequent, and Fine-grained (F$^3$) events presents a significant challenge in video analytics and multi-modal LLMs. Current methods struggle to identify events that satisfy all the F$^3$ criteria with high accuracy due to challenges such as motion blur and subtle visual discrepancies. To advance research in video understanding, we introduce F$^3$Set, a benchmark that consists of video datasets for precise F$^3$ event detection. Datasets in F$^3$Set are characterized by their extensive scale and comprehensive detail, usually encompassing over 1,000 event types with precise timestamps and supporting multi-level granularity. Currently, F$^3$Set contains several sports datasets, and this framework may be extended to other applications as well. We evaluated popular temporal action understanding methods on F$^3$Set, revealing substantial challenges for existing techniques. Additionally, we propose a new method, F$^3$ED, for F$^3$ event detections, achieving superior performance. The dataset, model, and benchmark code are available at https://github.com/F3Set/F3Set.