LiquidTAD: An Efficient Method for Temporal Action Detection via Liquid Neural Dynamics

📅 2026-04-20

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the high computational complexity, parameter redundancy, and deployment challenges of existing Transformer-based temporal action detection methods by proposing LiquidTAD, the first framework to integrate parallelized liquid neural networks into this domain. At its core, the ActionLiquid module leverages the closed-form continuous-time (CfC) formulation to model temporal dynamics with linear complexity, efficiently capturing action dependencies while employing learnable time constants to adaptively modulate sensitivity to varying action durations. On THUMOS-14, LiquidTAD achieves a state-of-the-art average mAP of 69.46% with only 10.82M parameters—63% fewer than ActionFormer—and demonstrates superior accuracy-efficiency trade-offs and robustness to temporal sampling on both ActivityNet-1.3 and Ego4D benchmarks.

Technology Category

Application Category

📝 Abstract

Temporal Action Detection (TAD) in untrimmed videos is currently dominated by Transformer-based architectures. While high-performing, their quadratic computational complexity and substantial parameter redundancy limit deployment in resource-constrained environments. In this paper, we propose LiquidTAD, a novel parameter-efficient framework that replaces cumbersome self-attention layers with parallelized ActionLiquid blocks. Unlike traditional Liquid Neural Networks (LNNs) that suffer from sequential execution bottlenecks, LiquidTAD leverages a closed-form continuous-time (CfC) formulation, allowing the model to be reformulated as a parallelizable operator while preserving the intrinsic physical prior of continuous-time dynamics. This architecture captures complex temporal dependencies with $O(N)$ linear complexity and adaptively modulates temporal sensitivity through learned time-constants ($τ$), providing a robust mechanism for handling varying action durations. To the best of our knowledge, this work is the first to introduce a parallelized LNN-based architecture to the TAD domain. Experimental results on the THUMOS-14 dataset demonstrate that LiquidTAD achieves a highly competitive Average mAP of 69.46\% with only 10.82M parameters -- a 63\% reduction compared to the ActionFormer baseline. Further evaluations on ActivityNet-1.3 and Ego4D benchmarks confirm that LiquidTAD achieves an optimal accuracy-efficiency trade-off and exhibits superior robustness to temporal sampling variations, advancing the Pareto frontier of modern TAD frameworks.

Problem

Research questions and friction points this paper is trying to address.

Temporal Action Detection

Computational Complexity

Parameter Efficiency

Resource-Constrained Deployment

Untrimmed Videos

Innovation

Methods, ideas, or system contributions that make the work stand out.

Liquid Neural Networks

Temporal Action Detection

Closed-form Continuous-time