From Theory to Throughput: CUDA-Optimized APML for Large-Batch 3D Learning

📅 2025-12-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In 3D point cloud learning, geometric loss functions face a trade-off between accuracy and computational efficiency: Chamfer Distance (CD) is efficient but lacks one-to-one correspondence constraints, while Earth Mover’s Distance (EMD) is accurate yet prohibitively expensive. Although APML approximates EMD via differentiable Sinkhorn iterations, its dense implementation incurs O(N²) memory complexity, limiting scalability. This paper proposes Sparse-APML—the first EMD-based loss leveraging COO-format sparsity to enable adaptive softmax, bidirectional symmetrization, and Sinkhorn normalization—achieving near-linear memory growth O(N). Fully differentiable, it preserves gradient integrity throughout optimization. Experiments on ShapeNet and MM-Fi show Sparse-APML attains reconstruction errors <1e−3, matching dense APML in accuracy, while reducing peak GPU memory by 99.9% and enabling ultra-large-batch training.

Technology Category

Application Category

📝 Abstract
Loss functions are fundamental to learning accurate 3D point cloud models, yet common choices trade geometric fidelity for computational cost. Chamfer Distance is efficient but permits many-to-one correspondences, while Earth Mover Distance better reflects one-to-one transport at high computational cost. APML approximates transport with differentiable Sinkhorn iterations and an analytically derived temperature, but its dense formulation scales quadratically in memory. We present CUDA-APML, a sparse GPU implementation that thresholds negligible assignments and runs adaptive softmax, bidirectional symmetrization, and Sinkhorn normalization directly in COO form. This yields near-linear memory scaling and preserves gradients on the stored support, while pairwise distance evaluation remains quadratic in the current implementation. On ShapeNet and MM-Fi, CUDA-APML matches dense APML within a small tolerance while reducing peak GPU memory by 99.9%. Code available at: https://github.com/Multimodal-Sensing-Lab/apml
Problem

Research questions and friction points this paper is trying to address.

Optimizes APML for large-batch 3D point cloud learning
Reduces memory scaling from quadratic to near-linear
Preserves geometric fidelity while cutting GPU memory 99.9%
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse GPU implementation with thresholded assignments
Adaptive softmax and bidirectional symmetrization in COO form
Near-linear memory scaling with preserved gradients
🔎 Similar Papers
No similar papers found.
Sasan Sharifipour
Sasan Sharifipour
PhD Researcher at University of Oulu
Graph Neural NetworksComplex NetworksDeep LearningComputer VisionMachine Learning
Constantino Álvarez Casado
Constantino Álvarez Casado
Postdoctoral Researcher, University of Oulu
Computer VisionMachine LearningDeep LearningHuman SensingDigital Signal Processing
M
Manuel Lage Cañellas
Center for Machine Vision and Signal Analysis (CMVS), University of Oulu, Finland
M
Miguel Bordallo López
Center for Machine Vision and Signal Analysis (CMVS), University of Oulu, Finland