REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation

📅 2024-05-25

📈 Citations: 1

✨ Influential: 1

career value

209K/year

🤖 AI Summary

Scene Graph Generation (SGG) faces a fundamental trade-off among relational prediction accuracy, object detection precision, and inference latency. To address this, we propose the first efficiency–accuracy co-optimization framework tailored for real-time SGG. Our approach introduces a lightweight graph neural network architecture, establishes a unified detection–relation reasoning paradigm, and incorporates a parameter-efficient knowledge distillation strategy. Under a single end-to-end model, our method simultaneously improves all three objectives: achieving 23 ms inference latency (a 63% reduction over SOTA), boosting object detection mAP by 58.51%, and reducing model parameters by 5.5×—all while preserving relational prediction accuracy. This work departs from conventional single-objective optimization paradigms and establishes a scalable, real-time SGG framework that harmonizes efficiency and fidelity.

Technology Category

Application Category

📝 Abstract

Scene Graph Generation (SGG) is a task that encodes visual relationships between objects in images as graph structures. SGG shows significant promise as a foundational component for downstream tasks, such as reasoning for embodied agents. To enable real-time applications, SGG must address the trade-off between performance and inference speed. However, current methods tend to focus on one of the following: (1) improving relation prediction accuracy, (2) enhancing object detection accuracy, or (3) reducing latency, without aiming to balance all three objectives simultaneously. To address this limitation, we propose a novel architecture, inference method, and relation prediction model. Our proposed solution, the REACT model, achieves the highest inference speed among existing SGG models, improving object detection accuracy without sacrificing relation prediction performance. Compared to state-of-the-art approaches, REACT is 2.7 times faster (with a latency of 23 ms) and improves object detection accuracy by 58.51%. Furthermore, our proposal significantly reduces model size, with an average of 5.5x fewer parameters. Code is available at https://github.com/Maelic/SGG-Benchmark

Problem

Research questions and friction points this paper is trying to address.

Balancing performance and speed trade-offs in Scene Graph Generation

Addressing the limitation of focusing on single objectives in SGG

Achieving real-time inference without sacrificing accuracy in SGG

Innovation

Methods, ideas, or system contributions that make the work stand out.

Balances accuracy, detection, and speed simultaneously

Achieves fastest inference speed among SGG models

Reduces model size with fewer parameters significantly

🔎 Similar Papers

No similar papers found.