PFM-VEPAR: Prompting Foundation Models for RGB-Event Camera based Pedestrian Attribute Recognition

📅 2026-03-19

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work addresses the limitations of existing RGB-event fusion methods, which suffer from high computational overhead and ineffective contextual utilization, particularly under low-light and motion-blurred conditions. To overcome these challenges, the authors propose a lightweight RGB-event framework for pedestrian attribute recognition that eschews auxiliary backbone networks and instead leverages minimal discrete cosine transform (DCT) and inverse DCT (IDCT) operations to extract frequency-domain cues from event streams for enhancing RGB features. The framework further incorporates an external memory bank coupled with modern Hopfield networks to model global cross-sample associations, thereby enriching feature representation. Efficient multimodal fusion is achieved through a cross-attention mechanism. Experimental results demonstrate that the proposed method achieves superior accuracy with significantly lower computational cost across multiple benchmarks, outperforming current state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract

Event-based pedestrian attribute recognition (PAR) leverages motion cues to enhance RGB cameras in low-light and motion-blur scenarios, enabling more accurate inference of attributes like age and emotion. However, existing two-stream multimodal fusion methods introduce significant computational overhead and neglect the valuable guidance from contextual samples. To address these limitations, this paper proposes an Event Prompter. Discarding the computationally expensive auxiliary backbone, this module directly applies extremely lightweight and efficient Discrete Cosine Transform (DCT) and Inverse DCT (IDCT) operations to the event data. This design extracts frequency-domain event features at a minimal computational cost, thereby effectively augmenting the RGB branch. Furthermore, an external memory bank designed to provide rich prior knowledge, combined with modern Hopfield networks, enables associative memory-augmented representation learning. This mechanism effectively mines and leverages global relational knowledge across different samples. Finally, a cross-attention mechanism fuses the RGB and event modalities, followed by feed-forward networks for attribute prediction. Extensive experiments on multiple benchmark datasets fully validate the effectiveness of the proposed RGB-Event PAR framework. The source code of this paper will be released on https://github.com/Event-AHU/OpenPAR

Problem

Research questions and friction points this paper is trying to address.

pedestrian attribute recognition

event camera

multimodal fusion

computational overhead

contextual guidance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Event Prompter

Discrete Cosine Transform

Associative Memory