LLM-Guided Semantic Bootstrapping for Interpretable Text Classification with Tsetlin Machines

📅 2026-04-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

189K/year
🤖 AI Summary
This work addresses the challenge of enhancing the semantic generalization capability of symbolic models without compromising their interpretability. It proposes a purely symbolic approach that leverages large language models (LLMs) only offline to generate sub-intent guidance for synthesizing training data, thereby avoiding any embedding or runtime LLM invocation. The method employs a three-stage curriculum to train a non-negated Tsetlin Machine and extracts high-confidence literals as semantic cues, which are then injected into real data to align symbolic logic with LLM-derived semantics. For the first time, this framework fully symbolizes LLM semantic priors and integrates them into the Tsetlin Machine. Evaluated on multiple text classification benchmarks, the approach significantly outperforms the original Tsetlin Machine while preserving computational efficiency and full interpretability, achieving accuracy comparable to BERT.

Technology Category

Application Category

📝 Abstract
Pretrained language models (PLMs) like BERT provide strong semantic representations but are costly and opaque, while symbolic models such as the Tsetlin Machine (TM) offer transparency but lack semantic generalization. We propose a semantic bootstrapping framework that transfers LLM knowledge into symbolic form, combining interpretability with semantic capacity. Given a class label, an LLM generates sub-intents that guide synthetic data creation through a three-stage curriculum (seed, core, enriched), expanding semantic diversity. A Non-Negated TM (NTM) learns from these examples to extract high-confidence literals as interpretable semantic cues. Injecting these cues into real data enables a TM to align clause logic with LLM-inferred semantics. Our method requires no embeddings or runtime LLM calls, yet equips symbolic models with pretrained semantic priors. Across multiple text classification tasks, it improves interpretability and accuracy over vanilla TM, achieving performance comparable to BERT while remaining fully symbolic and efficient.
Problem

Research questions and friction points this paper is trying to address.

Interpretable Text Classification
Semantic Generalization
Tsetlin Machine
Large Language Models
Symbolic AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic Bootstrapping
Tsetlin Machine
Interpretable AI
LLM Knowledge Transfer
Symbolic Learning