🤖 AI Summary
This work addresses the high computational cost of sparse matrix ordering in linear systems arising from triangular meshes. We propose an efficient ordering algorithm that accelerates nested dissection and rapidly constructs elimination trees by moderately relaxing constraints on partition balance and optimality, while integrating local block ordering with a separator-based quotient graph compression strategy. The method innovatively trades a controlled degradation in ordering quality for substantial computational speedup, preserving the fill-reducing structure required for Cholesky factorization while bypassing its most expensive phases. When integrated into commercial CPU/GPU sparse Cholesky solvers, our approach significantly reduces ordering time in graphics applications and achieves up to a 6.27× improvement in overall solver performance.
📝 Abstract
We present a fast sparse matrix permutation algorithm tailored to linear systems arising from triangle meshes. Our approach produces nested-dissection-style permutations while significantly reducing permutation runtime overhead. Rather than enforcing strict balance and separator optimality, the algorithm deliberately relaxes these design decisions to favor fast partitioning and efficient elimination-tree construction. Our method decomposes permutation into patch-level local orderings and a compact quotient-graph ordering of separators, preserving the essential structure required by sparse Cholesky factorization while avoiding its most expensive components. We integrate our algorithm into vendor-maintained sparse Cholesky solvers on both CPUs and GPUs. Across a range of graphics applications, including single factorizations, repeated factorizations, our method reduces permutation time and improves the sparse Cholesky solve performance by up to 6.27x.