PFM-VEPAR: Prompting Foundation Models for RGB-Event Camera based Pedestrian Attribute Recognition

📅 2026-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing RGB-event fusion methods, which suffer from high computational overhead and ineffective contextual utilization, particularly under low-light and motion-blurred conditions. To overcome these challenges, the authors propose a lightweight RGB-event framework for pedestrian attribute recognition that eschews auxiliary backbone networks and instead leverages minimal discrete cosine transform (DCT) and inverse DCT (IDCT) operations to extract frequency-domain cues from event streams for enhancing RGB features. The framework further incorporates an external memory bank coupled with modern Hopfield networks to model global cross-sample associations, thereby enriching feature representation. Efficient multimodal fusion is achieved through a cross-attention mechanism. Experimental results demonstrate that the proposed method achieves superior accuracy with significantly lower computational cost across multiple benchmarks, outperforming current state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract
Event-based pedestrian attribute recognition (PAR) leverages motion cues to enhance RGB cameras in low-light and motion-blur scenarios, enabling more accurate inference of attributes like age and emotion. However, existing two-stream multimodal fusion methods introduce significant computational overhead and neglect the valuable guidance from contextual samples. To address these limitations, this paper proposes an Event Prompter. Discarding the computationally expensive auxiliary backbone, this module directly applies extremely lightweight and efficient Discrete Cosine Transform (DCT) and Inverse DCT (IDCT) operations to the event data. This design extracts frequency-domain event features at a minimal computational cost, thereby effectively augmenting the RGB branch. Furthermore, an external memory bank designed to provide rich prior knowledge, combined with modern Hopfield networks, enables associative memory-augmented representation learning. This mechanism effectively mines and leverages global relational knowledge across different samples. Finally, a cross-attention mechanism fuses the RGB and event modalities, followed by feed-forward networks for attribute prediction. Extensive experiments on multiple benchmark datasets fully validate the effectiveness of the proposed RGB-Event PAR framework. The source code of this paper will be released on https://github.com/Event-AHU/OpenPAR
Problem

Research questions and friction points this paper is trying to address.

pedestrian attribute recognition
event camera
multimodal fusion
computational overhead
contextual guidance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Event Prompter
Discrete Cosine Transform
Associative Memory
Cross-Attention Fusion
RGB-Event Fusion
🔎 Similar Papers
No similar papers found.
M
Minghe Xu
City University of Macau, Macau SAR, China
R
Rouying Wu
Macau University of Science and Technology, Macau SAR, China
C
ChiaWei Chu
City University of Macau, Macau SAR, China
X
Xiao Wang
School of Computer Science and Technology, Anhui University, Hefei 230601, China
Yu Li
Yu Li
University of Science and Technology
MRIparcellationneuroimagingmachine/deep learning