LiquidTAD: An Efficient Method for Temporal Action Detection via Liquid Neural Dynamics

📅 2026-04-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

207K/year
🤖 AI Summary
This work addresses the high computational complexity, parameter redundancy, and deployment challenges of existing Transformer-based temporal action detection methods by proposing LiquidTAD, the first framework to integrate parallelized liquid neural networks into this domain. At its core, the ActionLiquid module leverages the closed-form continuous-time (CfC) formulation to model temporal dynamics with linear complexity, efficiently capturing action dependencies while employing learnable time constants to adaptively modulate sensitivity to varying action durations. On THUMOS-14, LiquidTAD achieves a state-of-the-art average mAP of 69.46% with only 10.82M parameters—63% fewer than ActionFormer—and demonstrates superior accuracy-efficiency trade-offs and robustness to temporal sampling on both ActivityNet-1.3 and Ego4D benchmarks.

Technology Category

Application Category

📝 Abstract
Temporal Action Detection (TAD) in untrimmed videos is currently dominated by Transformer-based architectures. While high-performing, their quadratic computational complexity and substantial parameter redundancy limit deployment in resource-constrained environments. In this paper, we propose LiquidTAD, a novel parameter-efficient framework that replaces cumbersome self-attention layers with parallelized ActionLiquid blocks. Unlike traditional Liquid Neural Networks (LNNs) that suffer from sequential execution bottlenecks, LiquidTAD leverages a closed-form continuous-time (CfC) formulation, allowing the model to be reformulated as a parallelizable operator while preserving the intrinsic physical prior of continuous-time dynamics. This architecture captures complex temporal dependencies with $O(N)$ linear complexity and adaptively modulates temporal sensitivity through learned time-constants ($τ$), providing a robust mechanism for handling varying action durations. To the best of our knowledge, this work is the first to introduce a parallelized LNN-based architecture to the TAD domain. Experimental results on the THUMOS-14 dataset demonstrate that LiquidTAD achieves a highly competitive Average mAP of 69.46\% with only 10.82M parameters -- a 63\% reduction compared to the ActionFormer baseline. Further evaluations on ActivityNet-1.3 and Ego4D benchmarks confirm that LiquidTAD achieves an optimal accuracy-efficiency trade-off and exhibits superior robustness to temporal sampling variations, advancing the Pareto frontier of modern TAD frameworks.
Problem

Research questions and friction points this paper is trying to address.

Temporal Action Detection
Computational Complexity
Parameter Efficiency
Resource-Constrained Deployment
Untrimmed Videos
Innovation

Methods, ideas, or system contributions that make the work stand out.

Liquid Neural Networks
Temporal Action Detection
Closed-form Continuous-time
Parameter Efficiency
Parallelizable Architecture
🔎 Similar Papers
No similar papers found.