Exploring 3D Dataset Pruning

📅 2026-02-28

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the challenge in 3D dataset pruning where the long-tailed class distribution induces a conflict between optimizing overall accuracy (OA) and mean per-class accuracy (mAcc), making it difficult for existing methods to simultaneously preserve representativeness and align with evaluation metrics. The authors formulate pruning as approximating the expected risk of the full dataset via a weighted subset, introducing a class-wise retention quota mechanism to ensure adequate coverage of tail classes. By integrating representation-aware subset selection with prior-invariant teacher supervision, the method enables soft-label calibration and embedding geometry distillation. This approach is the first to systematically mitigate coverage error and prior mismatch bias in 3D pruning, achieving significant improvements in both OA and mAcc across multiple 3D datasets while flexibly accommodating downstream task preferences, thereby demonstrating its effectiveness and generalizability.

Technology Category

Application Category

📝 Abstract

Dataset pruning has been widely studied for 2D images to remove redundancy and accelerate training, while particular pruning methods for 3D data remain largely unexplored. In this work, we study dataset pruning for 3D data, where its observed common long-tail class distribution nature make optimization under conventional evaluation metrics Overall Accuracy (OA) and Mean Accuracy (mAcc) inherently conflicting, and further make pruning particularly challenging. To address this, we formulate pruning as approximating the full-data expected risk with a weighted subset, which reveals two key errors: coverage error from insufficient representativeness and prior-mismatch bias from inconsistency between subset-induced class weights and target metrics. We propose representation-aware subset selection with per-class retention quotas for long-tail coverage, and prior-invariant teacher supervision using calibrated soft labels and embedding-geometry distillation. The retention quota also serves as a switch to control the OA-mAcc trade-off. Extensive experiments on 3D datasets show that our method can improve both metrics across multiple settings while adapting to different downstream preferences. Our code is available at https://github.com/XiaohanZhao123/3D-Dataset-Pruning.

Problem

Research questions and friction points this paper is trying to address.

3D dataset pruning

long-tail class distribution

Overall Accuracy

Mean Accuracy

dataset redundancy

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D dataset pruning

long-tail distribution

representation-aware selection