🤖 AI Summary
Indoor scene classification faces challenges including complex object relationships and variable spatial layouts, particularly demanding simultaneous performance, interpretability, and privacy preservation in sensitive content detection (e.g., CSAI). To address this, we propose ASGRA: a framework that converts input images into structured scene graphs and employs a Graph Attention Network (GAT) to jointly model semantic and spatial relationships among objects—bypassing raw pixel processing entirely. This enables end-to-end training under privacy constraints, eliminating the need for access to sensitive imagery while yielding interpretable relational reasoning paths. On the Places8 benchmark, ASGRA achieves an 81.27% balanced accuracy, substantially outperforming state-of-the-art image-level methods. In real-world CSAI detection, it attains 74.27% balanced accuracy, demonstrating robustness and practical efficacy in high-risk operational settings.
📝 Abstract
Indoor scene classification is a critical task in computer vision, with wide-ranging applications that go from robotics to sensitive content analysis, such as child sexual abuse imagery (CSAI) classification. The problem is particularly challenging due to the intricate relationships between objects and complex spatial layouts. In this work, we propose the Attention over Scene Graphs for Sensitive Content Analysis (ASGRA), a novel framework that operates on structured graph representations instead of raw pixels. By first converting images into Scene Graphs and then employing a Graph Attention Network for inference, ASGRA directly models the interactions between a scene's components. This approach offers two key benefits: (i) inherent explainability via object and relationship identification, and (ii) privacy preservation, enabling model training without direct access to sensitive images. On Places8, we achieve 81.27% balanced accuracy, surpassing image-based methods. Real-world CSAI evaluation with law enforcement yields 74.27% balanced accuracy. Our results establish structured scene representations as a robust paradigm for indoor scene classification and CSAI classification. Code is publicly available at https://github.com/tutuzeraa/ASGRA.