🤖 AI Summary
Unstructured meshes pose significant computational challenges in visualization due to their irregular topology and complex connectivity, rendering connectivity computation a CPU-intensive bottleneck that limits both temporal and memory efficiency. To address this, we propose GALE—the first open-source CUDA data structure designed for heterogeneous CPU–GPU collaboration—enabling full offloading of task-parallel connectivity computation to the GPU. This achieves strict separation of concerns: the CPU handles control and scheduling, while the GPU executes data-intensive computation with fine-grained parallelism. Leveraging localized data structures and a task-parallel programming model, GALE achieves up to 2.7× speedup over state-of-the-art methods on a 20-core CPU + NVIDIA V100 platform, while maintaining memory efficiency. Our core contributions include: (1) the GALE data structure, (2) a novel heterogeneous parallel paradigm for connectivity computation, and (3) a publicly available open-source implementation.
📝 Abstract
Unstructured meshes present challenges in scientific data analysis due to irregular distribution and complex connectivity. Computing and storing connectivity information is a major bottleneck for visualization algorithms, affecting both time and memory performance. Recent task-parallel data structures address this by precomputing connectivity information at runtime while the analysis algorithm executes, effectively hiding computation costs and improving performance. However, existing approaches are CPU-bound, forcing the data structure and analysis algorithm to compete for the same computational resources, limiting potential speedups. To overcome this limitation, we introduce a novel task-parallel approach optimized for heterogeneous CPU-GPU systems. Specifically, we offload the computation of mesh connectivity information to GPU threads, enabling CPU threads to focus on executing the visualization algorithm. Following this paradigm, we propose GALE (GPU-Aided Localized data structurE), the first open-source CUDA-based data structure designed for heterogeneous task parallelism. Experiments on two 20-core CPUs and an NVIDIA V100 GPU show that GALE achieves up to 2.7x speedup over state-of-the-art localized data structures while maintaining memory efficiency.