🤖 AI Summary
This work addresses the computational bottleneck in convex hull construction for planar point sets. We propose a vectorized parallel Quickhull algorithm tailored for multicore CPUs. Methodologically, we innovatively integrate SIMD vector instructions to accelerate extremal point detection and point classification, and design a lightweight inter-core coordination mechanism to minimize data movement. Furthermore, we restructure memory access patterns and employ NUMA-aware scheduling to significantly improve cache efficiency and memory bandwidth utilization. Experimental evaluation on an 8-core platform demonstrates 1.6–16× speedup over serial execution and 1.5–11× speedup over prior parallel implementations, achieving up to 50% parallel efficiency and 92% peak memory bandwidth utilization. Energy consumption is also reduced relative to conventional implementations. To our knowledge, this is the first work to deeply embed fine-grained vectorization into the core loop of Quickhull while preserving correctness and approaching hardware performance limits.
📝 Abstract
Finding the convex hull is a fundamental problem in computational geometry. Quickhull is a fast algorithm for finding convex hulls. In this paper, we present VQhull, a fast parallel implementation of Quickhull that exploits vector instructions, and coordinates CPU cores in a way that minimizes data movement. This implementation obtains a sequential runtime improvement of 1.6--16x, and a parallel runtime improvement of 1.5-11x compared to the state of the art on the Problem Based Benchmark Suite. VQhull achieves 85--100% of non-NUMA architectures' peak bandwidth, and 66--78% on our two-CPU NUMA system. This leaves little room for further improvements.
A 4x speedup on 8 cores has a parallel efficiency of 50%. This suggests a waste of energy, but our measurements show a more complicated picture: energy usage may even be lower in parallel. Quickhull serves as a case study that runtime and energy consumption do not go hand in hand.