🤖 AI Summary
This work addresses the limited adaptability of static networks in handling 3D point cloud scenes characterized by geometric diversity, class imbalance, and highly variable spatial layouts. To this end, the authors propose PointTPA, a novel framework that introduces, for the first time, a test-time dynamic parameter adaptation mechanism to 3D scene understanding. PointTPA employs two lightweight modules—Sequential Neighborhood Grouping (SNG) and a Dynamic Parameter Projector (DPP)—to dynamically generate input-aware, locally conditioned network parameters during inference, with less than 2% additional parameter overhead. When integrated into the PTv3 backbone, PointTPA achieves a state-of-the-art mIoU of 78.4% on the ScanNet validation set, significantly outperforming existing parameter-efficient fine-tuning approaches and demonstrating consistently strong performance across multiple benchmarks.
📝 Abstract
Scene-level point cloud understanding remains challenging due to diverse geometries, imbalanced category distributions, and highly varied spatial layouts. Existing methods improve object-level performance but rely on static network parameters during inference, limiting their adaptability to dynamic scene data. We propose PointTPA, a Test-time Parameter Adaptation framework that generates input-aware network parameters for scene-level point clouds. PointTPA adopts a Serialization-based Neighborhood Grouping (SNG) to form locally coherent patches and a Dynamic Parameter Projector (DPP) to produce patch-wise adaptive weights, enabling the backbone to adjust its behavior according to scene-specific variations while maintaining a low parameter overhead. Integrated into the PTv3 structure, PointTPA demonstrates strong parameter efficiency by introducing two lightweight modules of less than 2% of the backbone's parameters. Despite this minimal parameter overhead, PointTPA achieves 78.4% mIoU on ScanNet validation, surpassing existing parameter-efficient fine-tuning (PEFT) methods across multiple benchmarks, highlighting the efficacy of our test-time dynamic network parameter adaptation mechanism in enhancing 3D scene understanding. The code is available at https://github.com/H-EmbodVis/PointTPA.