π€ AI Summary
This work addresses the slow training speed and low GPU efficiency of static embedding models based on the Tsetlin Machine by proposing FastOmniTMAE, which introduces the first two-stage parallelization strategy for Tsetlin Machineβbased embedding training. By decoupling the originally sequential training process into parallel evaluation and update phases, the method significantly accelerates training while preserving embedding quality. Furthermore, a reusable hardware accelerator is designed for deployment on resource-constrained SoC-FPGA platforms. Experimental results demonstrate up to a 5Γ speedup in training on classification tasks, with embedding similarity scores of 0.669 on FPGA and 0.696 on UltraScale+ SoC, matching the embedding quality of the original approach.
π Abstract
Embedding models in natural language processing (NLP) increasingly rely on deep architectures such as BERT, while simpler models such as Word2Vec provide efficient representations but limited interpretability. The Tsetlin Machine (TM) offers an alternative logic-based learning paradigm. Omni TM Autoencoder (Omni TM-AE) applies this paradigm to static embedding by exploiting automaton state distributions within a single clause layer, but its training process remains slow. In this work, we propose FastOmniTMAE, a reformulation of Omni TM-AE that replaces sequential training dependencies with a two-stage parallel process: evaluation and update. Using a Single-Run Multi-Environment Benchmark covering classification, similarity, and clustering, FastOmniTMAE achieves up to 5$\times$ faster training in classification while maintaining comparable embedding quality under both Spearman and Kendall similarity measures. To address the limited efficiency of TM training on conventional GPUs, we further implement FastOmniTMAE as a reusable accelerator on SoC-FPGA platforms. The Multi-Hardware Benchmark shows that FastOmniTMAE achieves similarity scores of 0.669 on a resource-constrained FPGA and 0.696 on an UltraScale+ SoC, demonstrating efficient logic-based embedding training with a small hardware footprint.