REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation

📅 2024-05-25
📈 Citations: 1
Influential: 1
📄 PDF
🤖 AI Summary
Scene Graph Generation (SGG) faces a fundamental trade-off among relational prediction accuracy, object detection precision, and inference latency. To address this, we propose the first efficiency–accuracy co-optimization framework tailored for real-time SGG. Our approach introduces a lightweight graph neural network architecture, establishes a unified detection–relation reasoning paradigm, and incorporates a parameter-efficient knowledge distillation strategy. Under a single end-to-end model, our method simultaneously improves all three objectives: achieving 23 ms inference latency (a 63% reduction over SOTA), boosting object detection mAP by 58.51%, and reducing model parameters by 5.5×—all while preserving relational prediction accuracy. This work departs from conventional single-objective optimization paradigms and establishes a scalable, real-time SGG framework that harmonizes efficiency and fidelity.

Technology Category

Application Category

📝 Abstract
Scene Graph Generation (SGG) is a task that encodes visual relationships between objects in images as graph structures. SGG shows significant promise as a foundational component for downstream tasks, such as reasoning for embodied agents. To enable real-time applications, SGG must address the trade-off between performance and inference speed. However, current methods tend to focus on one of the following: (1) improving relation prediction accuracy, (2) enhancing object detection accuracy, or (3) reducing latency, without aiming to balance all three objectives simultaneously. To address this limitation, we propose a novel architecture, inference method, and relation prediction model. Our proposed solution, the REACT model, achieves the highest inference speed among existing SGG models, improving object detection accuracy without sacrificing relation prediction performance. Compared to state-of-the-art approaches, REACT is 2.7 times faster (with a latency of 23 ms) and improves object detection accuracy by 58.51%. Furthermore, our proposal significantly reduces model size, with an average of 5.5x fewer parameters. Code is available at https://github.com/Maelic/SGG-Benchmark
Problem

Research questions and friction points this paper is trying to address.

Balancing performance and speed trade-offs in Scene Graph Generation
Addressing the limitation of focusing on single objectives in SGG
Achieving real-time inference without sacrificing accuracy in SGG
Innovation

Methods, ideas, or system contributions that make the work stand out.

Balances accuracy, detection, and speed simultaneously
Achieves fastest inference speed among SGG models
Reduces model size with fewer parameters significantly
🔎 Similar Papers
No similar papers found.
M
Maëlic Neau
College of Science and Engineering, Flinders University, Australia
P
Paulo E. Santos
College of Science and Engineering, Flinders University, Australia
A
Anne-Gwenn Bosser
Ecole Nationale d’Ingénieurs de Brest, France
Cédric Buche
Cédric Buche
Full Professor, ENIB
Artificial Intelligence / Virtual Reality