DSFlash: Comprehensive Panoptic Scene Graph Generation in Realtime

πŸ“… 2026-03-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing scene graph generation methods struggle to simultaneously achieve real-time performance, resource efficiency, and comprehensive relationship coverage, hindering their deployment on edge devices. To address this challenge, this work proposes DSFlashβ€”a lightweight panoptic scene graph generation model that jointly models panoptic semantics and instance-level information while optimizing both inference and training pipelines. DSFlash enables real-time, complete scene graph generation without compromising accuracy, achieving 56 FPS on an RTX 3090 and matching state-of-the-art performance. Notably, it can be fully trained within 24 hours on a single GTX 1080 GPU, substantially lowering the hardware barrier and marking the first demonstration of efficient, holistic, and real-time scene graph generation.

Technology Category

Application Category

πŸ“ Abstract
Scene Graph Generation (SGG) aims to extract a detailed graph structure from an image, a representation that holds significant promise as a robust intermediate step for complex downstream tasks like reasoning for embodied agents. However, practical deployment in real-world applications - especially on resource constrained edge devices - requires speed and resource efficiency, challenges that have received limited attention in existing research. To bridge this gap, we introduce DSFlash, a low-latency model for panoptic scene graph generation designed to overcome these limitations. DSFlash can process a video stream at 56 frames per second on a standard RTX 3090 GPU, without compromising performance against existing state-of-the-art methods. Crucially, unlike prior approaches that often restrict themselves to salient relationships, DSFlash computes comprehensive scene graphs, offering richer contextual information while maintaining its superior latency. Furthermore, DSFlash is light on resources, requiring less than 24 hours to train on a single, nine-year-old GTX 1080 GPU. This accessibility makes DSFlash particularly well-suited for researchers and practitioners operating with limited computational resources, empowering them to adapt and fine-tune SGG models for specialized applications.
Problem

Research questions and friction points this paper is trying to address.

Scene Graph Generation
Real-time Processing
Resource Efficiency
Panoptic Scene Understanding
Edge Deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

real-time panoptic scene graph generation
low-latency SGG
resource-efficient training
comprehensive relationship modeling
edge-deployable vision model
πŸ”Ž Similar Papers
No similar papers found.