🤖 AI Summary
This work proposes a privacy-preserving human activity recognition method for highly sensitive environments such as restrooms and changing rooms, where conventional RGB-based surveillance poses significant privacy risks. To address the limitations of existing approaches—which often compromise semantic expressiveness or remain vulnerable to image reconstruction attacks—the proposed solution leverages an AI Flow framework within an edge-cloud协同 architecture. At the edge, raw images undergo millisecond-level nonlinear, irreversible transformation combined with random noise injection to produce anonymized feature vectors. The cloud then performs multimodal large-model inference solely on these vectors, enabling identity-agnostic anomaly detection. By design, this architecture severs potential privacy leakage pathways while ensuring mathematical irreversibility of the original imagery, thereby achieving a robust balance between high-level semantic understanding, real-time deployability, and stringent privacy protection in sensitive settings.
📝 Abstract
As intelligent sensing expands into high-privacy environments such as restrooms and changing rooms, the field faces a critical privacy-security paradox. Traditional RGB surveillance raises significant concerns regarding visual recording and storage, while existing privacy-preserving methods-ranging from physical desensitization to traditional cryptographic or obfuscation techniques-often compromise semantic understanding capabilities or fail to guarantee mathematical irreversibility against reconstruction attacks. To address these challenges, this study presents a novel privacy-preserving perception technology based on the AI Flow theoretical framework and an edge-cloud collaborative architecture. The proposed methodology integrates source desensitization with irreversible feature mapping. Leveraging Information Bottleneck theory, the edge device performs millisecond-level processing to transform raw imagery into abstract feature vectors via non-linear mapping and stochastic noise injection. This process constructs a unidirectional information flow that strips identity-sensitive attributes, rendering the reconstruction of original images impossible. Subsequently, the cloud platform utilizes multimodal family models to perform joint inference solely on these abstract vectors to detect abnormal behaviors. This approach fundamentally severs the path to privacy leakage at the architectural level, achieving a breakthrough from video surveillance to de-identified behavior perception and offering a robust solution for risk management in high-sensitivity public spaces.