From Theory to Throughput: CUDA-Optimized APML for Large-Batch 3D Learning

📅 2025-12-17

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

In 3D point cloud learning, geometric loss functions face a trade-off between accuracy and computational efficiency: Chamfer Distance (CD) is efficient but lacks one-to-one correspondence constraints, while Earth Mover’s Distance (EMD) is accurate yet prohibitively expensive. Although APML approximates EMD via differentiable Sinkhorn iterations, its dense implementation incurs O(N²) memory complexity, limiting scalability. This paper proposes Sparse-APML—the first EMD-based loss leveraging COO-format sparsity to enable adaptive softmax, bidirectional symmetrization, and Sinkhorn normalization—achieving near-linear memory growth O(N). Fully differentiable, it preserves gradient integrity throughout optimization. Experiments on ShapeNet and MM-Fi show Sparse-APML attains reconstruction errors <1e−3, matching dense APML in accuracy, while reducing peak GPU memory by 99.9% and enabling ultra-large-batch training.

Technology Category

Application Category

📝 Abstract

Loss functions are fundamental to learning accurate 3D point cloud models, yet common choices trade geometric fidelity for computational cost. Chamfer Distance is efficient but permits many-to-one correspondences, while Earth Mover Distance better reflects one-to-one transport at high computational cost. APML approximates transport with differentiable Sinkhorn iterations and an analytically derived temperature, but its dense formulation scales quadratically in memory. We present CUDA-APML, a sparse GPU implementation that thresholds negligible assignments and runs adaptive softmax, bidirectional symmetrization, and Sinkhorn normalization directly in COO form. This yields near-linear memory scaling and preserves gradients on the stored support, while pairwise distance evaluation remains quadratic in the current implementation. On ShapeNet and MM-Fi, CUDA-APML matches dense APML within a small tolerance while reducing peak GPU memory by 99.9%. Code available at: https://github.com/Multimodal-Sensing-Lab/apml

Problem

Research questions and friction points this paper is trying to address.

Optimizes APML for large-batch 3D point cloud learning

Reduces memory scaling from quadratic to near-linear

Preserves geometric fidelity while cutting GPU memory 99.9%

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse GPU implementation with thresholded assignments

Adaptive softmax and bidirectional symmetrization in COO form

Near-linear memory scaling with preserved gradients

🔎 Similar Papers

BADM: Batch ADMM for Deep Learning