CAKE: Real-time Action Detection via Motion Distillation and Background-aware Contrastive Learning

📅 2026-03-25

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the challenges of online action detection—namely, high computational cost and difficulty in distinguishing foreground actions from background motion—by proposing the CAKE framework. CAKE transfers motion knowledge from optical flow to an RGB model via motion distillation and introduces a novel Dynamic Motion Adapter (DMA) to implicitly model temporal dynamics. Furthermore, it employs a floating contrastive learning strategy to enhance discrimination between foreground actions and background noise. Notably, the method eliminates the need for explicit optical flow computation, achieving state-of-the-art mean average precision (mAP) on TVSeries, THUMOS'14, and Kinetics-400 while running at over 72 FPS on a single CPU. This combination of efficiency and accuracy makes CAKE well-suited for deployment in resource-constrained environments.

Technology Category

Application Category

📝 Abstract

Online Action Detection (OAD) systems face two primary challenges: high computational cost and insufficient modeling of discriminative temporal dynamics against background motion. Adding optical flow could provides strong motion cues but it incurs significant computational overhead. We propose CAKE, a OAD Flow-based distillation framework to transfer motion knowledge into RGB models. We propose Dynamic Motion Adapter (DMA) to suppress static background noise and emphasize pixel changes, effectively approximating optical flow without explicit computation. The framework also integrates a Floating Contrastive Learning strategy to distinguish informative motion dynamics from temporal background. Various experiments conducted on the TVSeries, THUMOS'14, Kinetics-400 datasets show effectiveness of our model. CAKE achieves a standout mAP compared with SOTA while using the same backbone. Our model operates at over 72 FPS on a single CPU, making it highly suitable for resource-constrained systems.

Problem

Research questions and friction points this paper is trying to address.

Online Action Detection

Computational Cost

Temporal Dynamics

Background Motion

Motion Cues

Innovation

Methods, ideas, or system contributions that make the work stand out.

Motion Distillation

Background-aware Contrastive Learning

Dynamic Motion Adapter