Dynamic Graph Neural Network with Adaptive Features Selection for RGB-D Based Indoor Scene Recognition

📅 2026-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the underutilization of multimodal local features in RGB-D indoor scene recognition by proposing a dynamic graph neural network approach. The method employs an adaptive node selection mechanism to extract salient local features from both RGB and depth modalities, constructs a hierarchical graph structure based on spatial relationships among objects, and leverages an attention mechanism to dynamically update graph connections for effective cross-modal feature fusion. Experimental results on the SUN RGB-D and NYU Depth v2 datasets demonstrate that the proposed approach significantly outperforms existing state-of-the-art methods, validating its capability to adaptively discover and integrate discriminative local features from dual modalities.
📝 Abstract
Multi-modality of color and depth, i.e., RGB-D, is of great importance in recent research of indoor scene recognition. In this kind of data representation, depth map is able to describe the 3D structure of scenes and geometric relations among objects. Previous works showed that local features of both modalities are vital for promotion of recognition accuracy. However, the problem of adaptive selection and effective exploitation on these key local features remains open in this field. In this paper, a dynamic graph model is proposed with adaptive node selection mechanism to solve the above problem. In this model, a dynamic graph is built up to model the relations among objects and scene, and a method of adaptive node selection is proposed to take key local features from both modalities of RGB and depth for graph modeling. After that, these nodes are grouped by three different levels, representing near or far relations among objects. Moreover, the graph model is updated dynamically according to attention weights. Finally, the updated and optimized features of RGB and depth modalities are fused together for indoor scene recognition. Experiments are performed on public datasets including SUN RGB-D and NYU Depth v2. Extensive results demonstrate that our method has superior performance when comparing to state-of-the-arts methods, and show that the proposed method is able to exploit crucial local features from both modalities of RGB and depth.
Problem

Research questions and friction points this paper is trying to address.

RGB-D
indoor scene recognition
adaptive feature selection
multi-modality
local features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Graph Neural Network
Adaptive Feature Selection
RGB-D Scene Recognition
Multi-modal Fusion
Attention-based Graph Update
🔎 Similar Papers
No similar papers found.
Qiong Liu
Qiong Liu
School of Electronic information and communications, Huazhong University of Science and Technology
video codingimage processing3d video
R
Ruofei Xiong
School of Electronic Information and Communications, Huazhong University of Science and Technology, No. 1037, Luoyu Rd., Wuhan, 430074, P. R. China
X
Xingzhen Chen
School of Electronic Information and Communications, Huazhong University of Science and Technology, No. 1037, Luoyu Rd., Wuhan, 430074, P. R. China
Muyao Peng
Muyao Peng
Huazhong University of Science and Technology
Computer VisionRobotics
You Yang
You Yang
Huazhong University of Science and Technology
3D video communicationscomputational and impulse imaging