A Comparative Study of Feature Selection in Tsetlin Machines

📅 2025-08-09

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Tsetlin Machines (TMs) have long lacked effective feature importance assessment mechanisms, hindering their interpretability, efficiency, and generalization. This work presents the first systematic study of feature selection for TMs. We propose an embedded scorer leveraging clause weights and Tsetlin automaton states—exploiting the TM’s intrinsic logical structure to uncover feature interactions. We introduce the first TM-specific feature selection benchmark, comprising 12 diverse datasets. Our unified evaluation framework comparatively assesses classical filter methods, embedded approaches, and post-hoc explainers (e.g., SHAP, LIME), augmented with causal evaluation protocols (ROAR/ROAD). Experiments demonstrate that our lightweight, TM-native scorer achieves accuracy preservation on par with computationally intensive alternatives—while incurring negligible overhead. This significantly enhances TM interpretability and practical applicability without compromising performance.

Technology Category

Application Category

📝 Abstract

Feature Selection (FS) is crucial for improving model interpretability, reducing complexity, and sometimes for enhancing accuracy. The recently introduced Tsetlin machine (TM) offers interpretable clause-based learning, but lacks established tools for estimating feature importance. In this paper, we adapt and evaluate a range of FS techniques for TMs, including classical filter and embedded methods as well as post-hoc explanation methods originally developed for neural networks (e.g., SHAP and LIME) and a novel family of embedded scorers derived from TM clause weights and Tsetlin automaton (TA) states. We benchmark all methods across 12 datasets, using evaluation protocols, like Remove and Retrain (ROAR) strategy and Remove and Debias (ROAD), to assess causal impact. Our results show that TM-internal scorers not only perform competitively but also exploit the interpretability of clauses to reveal interacting feature patterns. Simpler TM-specific scorers achieve similar accuracy retention at a fraction of the computational cost. This study establishes the first comprehensive baseline for FS in TM and paves the way for developing specialized TM-specific interpretability techniques.

Problem

Research questions and friction points this paper is trying to address.

Lack of established feature importance tools for Tsetlin Machines

Adapting and evaluating FS techniques for interpretable clause-based learning

Benchmarking FS methods to assess causal impact on performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapted classical and neural FS techniques for TMs

Introduced novel TM-internal scorers from clause weights

Benchmarked methods using ROAR and ROAD protocols

🔎 Similar Papers

An Optimized Toolbox for Advanced Image Processing with Tsetlin Machine Composites