🤖 AI Summary
Existing event-processing methods struggle to simultaneously achieve fine-grained temporal modeling, polarity awareness, and efficient long-sequence representation, while heavily relying on frame- or voxel-based conversions—introducing information loss, high computational overhead, and parameter inefficiency. To address these limitations, we propose the Frequency-aware Event Cloud Network (FECNet), the first architecture that integrates event cloud representation with Fourier-domain feature extraction to directly model the 2S-1T-1P (two spatial dimensions, one temporal dimension, one polarity dimension) spatiotemporal structure of event camera data. FECNet introduces an event cloud grouping sampling module and a lightweight MAC-optimized backbone. Evaluated on event-based object classification, action recognition, and human pose estimation, FECNet achieves significant improvements in both accuracy and inference efficiency, while drastically reducing multiply-accumulate (MAC) operations.
📝 Abstract
Event cameras are biologically inspired sensors that emit events asynchronously with remarkable temporal resolution, garnering significant attention from both industry and academia. Mainstream methods favor frame and voxel representations, which reach a satisfactory performance while introducing time-consuming transformation, bulky models, and sacrificing fine-grained temporal information. Alternatively, Point Cloud representation demonstrates promise in addressing the mentioned weaknesses, but it ignores the polarity information, and its models have limited proficiency in abstracting long-term events' features. In this paper, we propose a frequency-aware network named FECNet that leverages Event Cloud representations. FECNet fully utilizes 2S-1T-1P Event Cloud by innovating the event-based Group and Sampling module. To accommodate the long sequence events from Event Cloud, FECNet embraces feature extraction in the frequency domain via the Fourier transform. This approach substantially extinguishes the explosion of Multiply Accumulate Operations (MACs) while effectively abstracting spatial-temporal features. We conducted extensive experiments on event-based object classification, action recognition, and human pose estimation tasks, and the results substantiate the effectiveness and efficiency of FECNet.