REACT++: Efficient Cross-Attention for Real-Time Scene Graph Generation

📅 2026-03-06

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Existing scene graph generation methods struggle to simultaneously achieve high relationship prediction accuracy, precise object detection, and fast inference speed, limiting their applicability in real-time scenarios. This work proposes a lightweight and efficient model for real-time scene graph generation that introduces, for the first time, a subject-object cross-attention mechanism in prototype space. By integrating prototype-based feature representations with a streamlined architecture, the model enhances both relationship prediction accuracy and inference efficiency without compromising object detection performance. Compared to the previous REACT model, the proposed approach achieves a 20% improvement in inference speed and an average 10% gain in relationship prediction accuracy, establishing itself as the fastest scene graph generation method to date.

Technology Category

Application Category

📝 Abstract

Scene Graph Generation (SGG) is a task that encodes visual relationships between objects in images as graph structures. SGG shows significant promise as a foundational component for downstream tasks, such as reasoning for embodied agents. To enable real-time applications, SGG must address the trade-off between performance and inference speed. However, current methods tend to focus on one of the following: (1) improving relation prediction accuracy, (2) enhancing object detection accuracy, or (3) reducing latency, without aiming to balance all three objectives simultaneously. To address this limitation, we build on the powerful Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation (REACT) architecture and propose REACT++, a new state-of-the-art model for real-time SGG. By leveraging efficient feature extraction and subject-to-object cross-attention within the prototype space, REACT++ balances latency and representational power. REACT++ achieves the highest inference speed among existing SGG models, improving relation prediction accuracy without sacrificing object detection performance. Compared to the previous REACT version, REACT++ is 20% faster with a gain of 10% in relation prediction accuracy on average. The code is available at https://github.com/Maelic/SGG-Benchmark.

Problem

Research questions and friction points this paper is trying to address.

Scene Graph Generation

real-time inference

relation prediction

object detection

latency

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-attention

real-time scene graph generation

prototype space