🤖 AI Summary
Image privacy classification faces challenges including high subjectivity and semantic diversity; existing graph neural network (GNN)-based approaches are computationally complex and inefficient. This paper proposes a lightweight, vision-based scene-driven method that abandons GNNs entirely, instead leveraging transfer learning and fine-tuning of compact CNNs to model the association between scene semantics and privacy attributes. Theoretical analysis and empirical evaluation demonstrate that graph-structured representations contribute negligibly to privacy classification, while high-dimensional entity features exhibit substantial redundancy. Our model contains only 732 parameters—achieving accuracy comparable to billion-parameter GNNs—thus reducing parameter count by 99.99%. It further offers strong interpretability, transparent training dynamics, and real-time inference capability. The core contribution lies in establishing that scene-level semantics alone suffice for robust privacy discrimination, achieving Pareto-optimal trade-offs between accuracy and efficiency.
📝 Abstract
Subjective interpretation and content diversity make predicting whether an image is private or public a challenging task. Graph neural networks combined with convolutional neural networks (CNNs), which consist of 14,000 to 500 millions parameters, generate features for visual entities (e.g., scene and object types) and identify the entities that contribute to the decision. In this paper, we show that using a simpler combination of transfer learning and a CNN to relate privacy with scene types optimises only 732 parameters while achieving comparable performance to that of graph-based methods. On the contrary, end-to-end training of graph-based methods can mask the contribution of individual components to the classification performance. Furthermore, we show that a high-dimensional feature vector, extracted with CNNs for each visual entity, is unnecessary and complexifies the model. The graph component has also negligible impact on performance, which is driven by fine-tuning the CNN to optimise image features for privacy nodes.