š¤ AI Summary
Existing projection-based point cloud segmentation methods employ fixed, view-agnostic rays, suffering from limited geometric adaptability and projection diversity; moreover, multi-view fusion incurs substantial computational overhead. To address these limitations, we propose a learning-driven, view-dependent projection (VDP) framework. Specifically, a learnable ray generation network models the 3Dā2D mapping via dynamic, firework-like ray trajectories, enabling view-aware adaptive projection. Additionally, a color regularization constraint is introduced to optimize semantic pixel distribution in the 2D projection, enhancing information density and utilization of projected features. The resulting method is lightweight and computationally efficient, achieving state-of-the-art performance on S3DIS and ScanNet benchmarks. It significantly reduces redundant computation while maintaining high accuracy, establishing a novel paradigm for efficient, low-overhead 3D semantic understanding.
š Abstract
In this paper, we propose view-dependent projection (VDP) to facilitate point cloud segmentation, designing efficient 3D-to-2D mapping that dynamically adapts to the spatial geometry from view variations. Existing projection-based methods leverage view-independent projection in complex scenes, relying on straight lines to generate direct rays or upward curves to reduce occlusions. However, their view independence provides projection rays that are limited to pre-defined parameters by human settings, restricting point awareness and failing to capture sufficient projection diversity across different view planes. Although multiple projections per view plane are commonly used to enhance spatial variety, the projected redundancy leads to excessive computational overhead and inefficiency in image processing. To address these limitations, we design a framework of VDP to generate data-driven projections from 3D point distributions, producing highly informative single-image inputs by predicting rays inspired by the adaptive behavior of fireworks. In addition, we construct color regularization to optimize the framework, which emphasizes essential features within semantic pixels and suppresses the non-semantic features within black pixels, thereby maximizing 2D space utilization in a projected image. As a result, our approach, PointVDP, develops lightweight projections in marginal computation costs. Experiments on S3DIS and ScanNet benchmarks show that our approach achieves competitive results, offering a resource-efficient solution for semantic understanding.