🤖 AI Summary
Scene Graph Generation (SGG) faces a fundamental trade-off among relational prediction accuracy, object detection precision, and inference latency. To address this, we propose the first efficiency–accuracy co-optimization framework tailored for real-time SGG. Our approach introduces a lightweight graph neural network architecture, establishes a unified detection–relation reasoning paradigm, and incorporates a parameter-efficient knowledge distillation strategy. Under a single end-to-end model, our method simultaneously improves all three objectives: achieving 23 ms inference latency (a 63% reduction over SOTA), boosting object detection mAP by 58.51%, and reducing model parameters by 5.5×—all while preserving relational prediction accuracy. This work departs from conventional single-objective optimization paradigms and establishes a scalable, real-time SGG framework that harmonizes efficiency and fidelity.
📝 Abstract
Scene Graph Generation (SGG) is a task that encodes visual relationships between objects in images as graph structures. SGG shows significant promise as a foundational component for downstream tasks, such as reasoning for embodied agents. To enable real-time applications, SGG must address the trade-off between performance and inference speed. However, current methods tend to focus on one of the following: (1) improving relation prediction accuracy, (2) enhancing object detection accuracy, or (3) reducing latency, without aiming to balance all three objectives simultaneously. To address this limitation, we propose a novel architecture, inference method, and relation prediction model. Our proposed solution, the REACT model, achieves the highest inference speed among existing SGG models, improving object detection accuracy without sacrificing relation prediction performance. Compared to state-of-the-art approaches, REACT is 2.7 times faster (with a latency of 23 ms) and improves object detection accuracy by 58.51%. Furthermore, our proposal significantly reduces model size, with an average of 5.5x fewer parameters. Code is available at https://github.com/Maelic/SGG-Benchmark