Boolformer: Symbolic Regression of Logic Functions with Transformers

📅 2023-09-21

🏛️ arXiv.org

📈 Citations: 5

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses symbolic regression of Boolean functions. We propose Boolformer, the first end-to-end learning framework for this task: a Transformer-based sequence-to-sequence model capable of synthesizing concise logical formulas from complete truth tables, as well as fitting robust approximate expressions from noisy or sparse observations. Key innovations include logic-aware tokenization of Boolean expressions, syntax-constrained decoding to ensure grammatical validity, and a self-supervised pretraining–fine-tuning paradigm. Evaluated on binary classification and gene regulatory network modeling, Boolformer achieves state-of-the-art performance. It accelerates inference by three to four orders of magnitude over genetic programming while maintaining high accuracy, strong interpretability, and favorable scalability. The source code and pretrained models are publicly available.

📝 Abstract

In this work, we introduce Boolformer, the first Transformer architecture trained to perform end-to-end symbolic regression of Boolean functions. First, we show that it can predict compact formulas for complex functions which were not seen during training, when provided a clean truth table. Then, we demonstrate its ability to find approximate expressions when provided incomplete and noisy observations. We evaluate the Boolformer on a broad set of real-world binary classification datasets, demonstrating its potential as an interpretable alternative to classic machine learning methods. Finally, we apply it to the widespread task of modelling the dynamics of gene regulatory networks. Using a recent benchmark, we show that Boolformer is competitive with state-of-the art genetic algorithms with a speedup of several orders of magnitude. Our code and models are available publicly.

Problem

Research questions and friction points this paper is trying to address.

Predicts compact Boolean formulas from truth tables

Finds approximate expressions with incomplete noisy data

Provides interpretable binary classification alternative to ML

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based symbolic regression for Boolean functions

Handles incomplete noisy observations effectively

Fast interpretable alternative to genetic algorithms

🔎 Similar Papers

Extracting Formulae in Many-Valued Logic from Deep Neural Networks