FDSG: Forecasting Dynamic Scene Graphs

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

Existing methods for dynamic scene graph generation suffer from a fundamental disjunction: either they ignore temporal dynamics and reconstruct only observed frames, or they assume static entities and predict only relations—thus failing to jointly model the spatiotemporal evolution of both entities and relations. This work introduces “dynamic scene graph prediction” as a novel task, enabling simultaneous reconstruction of scene graphs for observed frames and joint extrapolation of entity labels, bounding boxes, and relations for unseen future frames. Methodologically, we propose a query decomposition mechanism to decouple spatial and temporal modeling, integrate neural stochastic differential equations (Neural SDEs) for probabilistic temporal evolution, and employ cross-frame cross-attention coupled with temporal aggregation for holistic optimization. Evaluated on the Action Genome dataset, our approach achieves state-of-the-art performance across dynamic scene graph generation, anticipation, and prediction tasks.

Technology Category

Application Category

📝 Abstract

Dynamic scene graph generation extends scene graph generation from images to videos by modeling entity relationships and their temporal evolution. However, existing methods either generate scene graphs from observed frames without explicitly modeling temporal dynamics, or predict only relationships while assuming static entity labels and locations. These limitations hinder effective extrapolation of both entity and relationship dynamics, restricting video scene understanding. We propose Forecasting Dynamic Scene Graphs (FDSG), a novel framework that predicts future entity labels, bounding boxes, and relationships, for unobserved frames, while also generating scene graphs for observed frames. Our scene graph forecast module leverages query decomposition and neural stochastic differential equations to model entity and relationship dynamics. A temporal aggregation module further refines predictions by integrating forecasted and observed information via cross-attention. To benchmark FDSG, we introduce Scene Graph Forecasting, a new task for full future scene graph prediction. Experiments on Action Genome show that FDSG outperforms state-of-the-art methods on dynamic scene graph generation, scene graph anticipation, and scene graph forecasting. Codes will be released upon publication.

Problem

Research questions and friction points this paper is trying to address.

Predicting future entity labels and relationships in videos

Modeling temporal dynamics of entities and relationships

Overcoming limitations in dynamic scene graph generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Predicts future entity labels and relationships

Uses query decomposition and neural SDEs

Integrates forecasted and observed information via cross-attention

🔎 Similar Papers

No similar papers found.

TikTok

San Jose, California

ML Research Scientist, Prediction & Smart Agents

Nuro

$193,930 and $291,150

Mountain View, California (HQ) / California - HQ, Nuro HQ - Mountain View, CA

AI Research Scientist, Computer Vision - Facebook Video Intelligence