FastOmniTMAE: Parallel Clause Learning for Scalable and Hardware-Efficient Tsetlin Embeddings

πŸ“… 2026-05-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

200K/year
πŸ€– AI Summary
This work addresses the slow training speed and low GPU efficiency of static embedding models based on the Tsetlin Machine by proposing FastOmniTMAE, which introduces the first two-stage parallelization strategy for Tsetlin Machine–based embedding training. By decoupling the originally sequential training process into parallel evaluation and update phases, the method significantly accelerates training while preserving embedding quality. Furthermore, a reusable hardware accelerator is designed for deployment on resource-constrained SoC-FPGA platforms. Experimental results demonstrate up to a 5Γ— speedup in training on classification tasks, with embedding similarity scores of 0.669 on FPGA and 0.696 on UltraScale+ SoC, matching the embedding quality of the original approach.
πŸ“ Abstract
Embedding models in natural language processing (NLP) increasingly rely on deep architectures such as BERT, while simpler models such as Word2Vec provide efficient representations but limited interpretability. The Tsetlin Machine (TM) offers an alternative logic-based learning paradigm. Omni TM Autoencoder (Omni TM-AE) applies this paradigm to static embedding by exploiting automaton state distributions within a single clause layer, but its training process remains slow. In this work, we propose FastOmniTMAE, a reformulation of Omni TM-AE that replaces sequential training dependencies with a two-stage parallel process: evaluation and update. Using a Single-Run Multi-Environment Benchmark covering classification, similarity, and clustering, FastOmniTMAE achieves up to 5$\times$ faster training in classification while maintaining comparable embedding quality under both Spearman and Kendall similarity measures. To address the limited efficiency of TM training on conventional GPUs, we further implement FastOmniTMAE as a reusable accelerator on SoC-FPGA platforms. The Multi-Hardware Benchmark shows that FastOmniTMAE achieves similarity scores of 0.669 on a resource-constrained FPGA and 0.696 on an UltraScale+ SoC, demonstrating efficient logic-based embedding training with a small hardware footprint.
Problem

Research questions and friction points this paper is trying to address.

Tsetlin Machine
embedding
training efficiency
hardware efficiency
static embedding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tsetlin Machine
parallel clause learning
hardware-efficient embedding
FPGA acceleration
logic-based NLP
πŸ”Ž Similar Papers
No similar papers found.