π€ AI Summary
Existing RTIndex (RX) suffers from three key bottlenecks when building database indexes on GPU ray-tracing hardware: high per-key memory overhead, slow range queries, and poor update efficiency. This paper proposes cgRXβthe first RT Core-native indexing method designed specifically for coarse-grained bucket indexing. cgRX models keys as 3D triangular buckets and jointly optimizes space efficiency, query throughput, and dynamic update capability via serialized ray casting coupled with intra-bucket post-filtering. Compared to RX, cgRX achieves 1.5β3Γ higher memory-throughput efficiency, accelerates range queries by 2Γ, and performs single-key updates 5.5Γ faster than full index reconstruction. Crucially, cgRX is the first approach to simultaneously guarantee correctness and deliver performance gains for coarse-grained indexing on ray-tracing hardware.
π Abstract
In recent work, we have shown that NVIDIA's raytracing cores on RTX video cards can be exploited to realize hardware-accelerated lookups for GPU-resident database indexes. On a high level, the concept materializes all keys as triangles in a 3D scene and indexes them. Lookups are performed by firing rays into the scene and utilizing the index structure to detect hits in a hardware-accelerated fashion. While this approach called RTIndeX (or short RX) is indeed promising, it currently suffers from three limitations: (1) significant memory overhead per key, (2) slow range-lookups, and (3) poor updateability. In this work, we show that all three problems can be tackled by a single design change: Generalizing RX to become a coarse-granular index cgRX. Instead of indexing individual keys, cgRX indexes buckets of keys which are post-filtered after retrieval. This drastically reduces the memory overhead, leads to the generation of a smaller and more efficient index structure, and enables fast range-lookups as well as updates. We will see that representing the buckets in the 3D space such that the lookup of a key is performed both correctly and efficiently requires the careful orchestration of firing rays in a specific sequence. Our experimental evaluation shows that cgRX offers the most bang for the buck(et) by providing a throughput in relation to the memory footprint that is 1.5-3x higher than for the comparable range-lookup supporting baselines. At the same time, cgRX improves the range-lookup performance over RX by up to 2x and offers practical updateability that is up to 5.5x faster than rebuilding from scratch.