CuRast: Cuda-Based Software Rasterization for Billions of Triangles

📅 2026-04-23
📈 Citations: 0
Influential: 0
📄 PDF

career value

232K/year
🤖 AI Summary
This work proposes an efficient software rasterization method for dense, opaque meshes—comprising hundreds of millions to billions of triangles—as commonly encountered in photogrammetry and related applications, without requiring prebuilt acceleration structures. The approach employs a three-stage CUDA compute shader pipeline: small triangles are processed directly in the first stage using atomicMin operations to record the nearest fragments, while large triangles are deferred to subsequent stages. Compared to Vulkan hardware rasterization, the method achieves 2–5× speedup for single-instance scenes and up to 12× acceleration with instanced rendering, substantially outperforming existing solutions, although it remains approximately an order of magnitude slower on low-polygon-count meshes.

Technology Category

Application Category

📝 Abstract
Previous work shows that small triangles can be rasterized efficiently with compute shaders. Building on this insight, we explore how far this can be pushed for massive triangle datasets without the need to construct acceleration structures in advance. Method: A 3-stage rasterization pipeline first rasterizes small triangles directly in stage 1, using atomicMin to store the closest fragments. Larger triangles are forwarded to stages 2 and 3. Results: CuRast can render models with hundreds of millions of triangles up to 2-5x (unique) or up to 12x (instanced) faster than Vulkan. Vulkan remains an order of magnitude faster for low-poly meshes. Limitations: We currently focus on dense, opaque meshes that you would typically obtain from photogrammetry/3D reconstruction. Blending/Transparency is not yet supported, and scenes with thousands of low-poly meshes are not implemented efficiently. Future Work: To make it suitable for games and a wider range of use cases, future work will need to (1) optimize handling of scenes with tens of thousands of nodes/meshes, (2) add support for hierarchical clustered LODs such as those produced by Meshoptimizer, (3) add support for transparency, likely in its own stage so as to keep opaque rasterization untouched and fast. Source Code: https://github.com/m-schuetz/CuRast
Problem

Research questions and friction points this paper is trying to address.

software rasterization
massive triangle datasets
CUDA
acceleration structures
real-time rendering
Innovation

Methods, ideas, or system contributions that make the work stand out.

CUDA-based rasterization
massive triangle rendering
compute shader rasterization
atomicMin fragment storage
software rasterization pipeline