Fast and Compact Tsetlin Machine Inference on CPUs Using Instruction-Level Optimization

📅 2025-10-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the slow inference speed and high resource overhead of Tsetlin Machines (TMs) on CPU platforms, this paper proposes an instruction-level bitwise optimization method for efficient inference. The approach operates post-training without requiring model retraining. Its core contributions are: (1) a dynamic clause reordering strategy based on literal frequency analysis to maximize early termination probability of AND clauses; (2) a fine-grained, clause-driven early-exit mechanism; and (3) a compact model representation and parallel bitwise computation leveraging ARM-specific bit-manipulation instructions. Experimental evaluation on ARM platforms demonstrates up to 96.71% reduction in inference latency, while maintaining minimal code footprint. The method achieves a favorable trade-off among high throughput, low latency, and low memory and computational overhead—enabling practical, resource-efficient deployment of TMs on edge devices.

Technology Category

Application Category

📝 Abstract
The Tsetlin Machine (TM) offers high-speed inference on resource-constrained devices such as CPUs. Its logic-driven operations naturally lend themselves to parallel execution on modern CPU architectures. Motivated by this, we propose an efficient software implementation of the TM by leveraging instruction-level bitwise operations for compact model representation and accelerated processing. To further improve inference speed, we introduce an early exit mechanism, which exploits the TM's AND-based clause evaluation to avoid unnecessary computations. Building upon this, we propose a literal Reorder strategy designed to maximize the likelihood of early exits. This strategy is applied during a post-training, pre-inference stage through statistical analysis of all literals and the corresponding actions of their associated Tsetlin Automata (TA), introducing negligible runtime overhead. Experimental results using the gem5 simulator with an ARM processor show that our optimized implementation reduces inference time by up to 96.71% compared to the conventional integer-based TM implementations while maintaining comparable code density.
Problem

Research questions and friction points this paper is trying to address.

Optimizing Tsetlin Machine inference speed on CPUs
Reducing computational overhead through early exit mechanisms
Enhancing literal processing efficiency via statistical reordering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging instruction-level bitwise operations for compact representation
Introducing early exit mechanism to avoid unnecessary computations
Applying literal reorder strategy to maximize early exits
Y
Yefan Zeng
Microsystems Research Group, Newcastle University
Shengyu Duan
Shengyu Duan
Newcastle University
Rishad Shafik
Rishad Shafik
Professor of Microelectronic Systems, Newcastle University, UK
Machine Learning HardwareEnergy-Aware ComputingHW/SW Co-design
A
Alex Yakovlev
Microsystems Research Group, Newcastle University