HexPlane Representation for 3D Semantic Scene Understanding

📅 2025-03-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of semantic understanding in sparse and unstructured 3D point clouds, this paper introduces HexPlane—a novel representation that orthogonally projects point clouds onto six 2D planes. Lightweight 2D CNN or Transformer encoders extract features from each plane, while a View Projection Module (VPM) and a HexPlane Association Module (HAM) enable adaptive multi-view feature fusion to generate high-fidelity per-point semantic representations. HexPlane establishes the first hexahedral projection paradigm, preserving full spatial modeling capacity while significantly improving computational efficiency. It is plug-and-play compatible—requiring no modification to backbone architectures—and enhances existing point-, voxel-, or range-based methods. On ScanNet 3D semantic segmentation, HexPlane achieves 77.0 mIoU, surpassing Point Transformer V2 by 1.6 points. It also delivers substantial improvements on SemanticKITTI and indoor 3D detection benchmarks.

Technology Category

Application Category

📝 Abstract
In this paper, we introduce the HexPlane representation for 3D semantic scene understanding. Specifically, we first design the View Projection Module (VPM) to project the 3D point cloud into six planes to maximally retain the original spatial information. Features of six planes are extracted by the 2D encoder and sent to the HexPlane Association Module (HAM) to adaptively fuse the most informative information for each point. The fused point features are further fed to the task head to yield the ultimate predictions. Compared to the popular point and voxel representation, the HexPlane representation is efficient and can utilize highly optimized 2D operations to process sparse and unordered 3D point clouds. It can also leverage off-the-shelf 2D models, network weights, and training recipes to achieve accurate scene understanding in 3D space. On ScanNet and SemanticKITTI benchmarks, our algorithm, dubbed HexNet3D, achieves competitive performance with previous algorithms. In particular, on the ScanNet 3D segmentation task, our method obtains 77.0 mIoU on the validation set, surpassing Point Transformer V2 by 1.6 mIoU. We also observe encouraging results in indoor 3D detection tasks. Note that our method can be seamlessly integrated into existing voxel-based, point-based, and range-based approaches and brings considerable gains without bells and whistles. The codes will be available upon publication.
Problem

Research questions and friction points this paper is trying to address.

Introduces HexPlane for 3D semantic scene understanding.
Projects 3D point clouds into six planes efficiently.
Achieves competitive performance on ScanNet and SemanticKITTI benchmarks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

HexPlane representation for 3D scene understanding
View Projection Module retains spatial information
HexPlane Association Module fuses informative features
🔎 Similar Papers
No similar papers found.
Z
Zeren Chen
School of Software, Beihang University
Yuenan Hou
Yuenan Hou
Shanghai AI Laboratory
Autonomous DrivingEmbodied AIEfficient Learning
Y
Yulin Chen
Shanghai AI Laboratory
L
Li Liu
College of Electronic Science and Technology, National University of Defense Technology (NUDT)
X
Xiao Sun
Shanghai AI Laboratory
Lu Sheng
Lu Sheng
School of Software, Beihang University
Embodied AI3D VisionMachine Learning