🤖 AI Summary
This work addresses DFA minimization, equivalence checking, and inclusion checking—classical automata-theoretic problems—by proposing a GPU-accelerated massively parallel algorithm. Methodologically, it integrates parallel partition refinement with partial transitive closure computation to eliminate the global synchronization bottleneck inherent in Hopcroft’s algorithm; it further employs lock-free hash tables, fine-grained task scheduling, and the GPUexplore framework for efficient implementation. Theoretically, the algorithm achieves improved asymptotic time complexity on specific benchmarks. Experimentally, it significantly outperforms existing GPU-based approaches: up to 8.2× speedup in DFA minimization and up to 5.6× acceleration in equivalence and inclusion checking on large-scale DFAs. These results demonstrate that hardware-aware algorithmic restructuring—not mere porting of asymptotically optimal sequential algorithms—more effectively harnesses GPU parallelism.
📝 Abstract
We study parallel algorithms for the minimisation and equivalence checking of Deterministic Finite Automata (DFAs). Regarding DFA minimisation, we implement four different massively parallel algorithms on Graphics Processing Units~(GPUs). Our results confirm the expectations that the algorithm with the theoretically best time complexity is not practically suitable to run on GPUs due to the large amount of resources needed. We empirically verify that parallel partition refinement algorithms from the literature perform better in practice, even though their time complexity is worse. Furthermore, we introduce a novel algorithm based on partition refinement with an extra parallel partial transitive closure step and show that on specific benchmarks it has better run-time complexity and performs better in practice.
In addition, we address checking the language equivalence and inclusion of two DFAs. We consider the Hopcroft-Karp algorithm, and explain how a variant of it can be parallelised for GPUs. We note that these problems can be encoded for the GPU-accelerated model checker GPUexplore, allowing the use its lockless hash table and fine-grained parallel work distribution mechanism.