🤖 AI Summary
Privacy risks in visual data captured by AI-powered cameras in intelligent transportation systems remain unresolved, as conventional blurring or encryption techniques fail to simultaneously ensure privacy preservation and data utility. To address this, we propose an image-to-text privacy-preserving paradigm that replaces raw image transmission with semantically abstracted textual descriptions, inherently thwarting reconstruction-based attacks. Methodologically, we introduce a novel Mixture-of-Experts (MoE) architecture for fine-grained scene understanding and integrate a reinforcement learning agent to dynamically optimize text generation policies, balancing semantic fidelity against privacy strength. Evaluated on the CFP-FP dataset, our approach reduces replay attack success rate to 9.4%, substantially outperforming baselines. Generated descriptions exhibit richer semantics and improved structural coherence, demonstrating that our method effectively enhances data usability without compromising privacy guarantees.
📝 Abstract
The proliferation of AI-powered cameras in Intelligent Transportation Systems (ITS) creates a severe conflict between the need for rich visual data and the fundamental right to privacy. Existing privacy-preserving mechanisms, such as blurring or encryption, are often insufficient, creating an undesirable trade-off where either privacy is compromised against advanced reconstruction attacks or data utility is critically degraded. To resolve this impasse, we propose RL-MoE, a novel framework that transforms sensitive visual data into privacy-preserving textual descriptions, eliminating the need for direct image transmission. RL-MoE uniquely combines a Mixture-of-Experts (MoE) architecture for nuanced, multi-aspect scene decomposition with a Reinforcement Learning (RL) agent that optimizes the generated text for a dual objective of semantic accuracy and privacy preservation. Extensive experiments demonstrate that RL-MoE provides superior privacy protection, reducing the success rate of replay attacks to just 9.4% on the CFP-FP dataset, while simultaneously generating richer textual content than baseline methods. Our work provides a practical and scalable solution for building trustworthy AI systems in privacy-sensitive domains, paving the way for more secure smart city and autonomous vehicle networks.