Fitting Multilinear Polynomials for Logic Gate Networks

πŸ“… 2026-05-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

227K/year
πŸ€– AI Summary
This work addresses the challenges of training deep learnable logic gate networks, which suffer from parameter redundancy and vanishing gradients that hinder effective learning of interaction coefficients. The authors model two-input Boolean gates as four-dimensional multilinear polynomials, reformulating parameter learning as a low-dimensional vector quantization problem. They introduce a covariance-Jacobian (CovJac)-based soft vector quantization method that circumvents coefficient starvation caused by straight-through estimators, enabling stable optimization in deep architectures. By integrating multilinear encoding, soft quantization, and a CovJac-driven gate selection mechanism, the proposed approach matches or surpasses the Soft-Mix baseline using only four parameters across seven datasets. Notably, in deep settings, it demonstrates substantially improved stabilityβ€”e.g., on CIFAR-10, performance drops by merely 0.5 percentage points compared to a drastic 37.3-point decline with Soft-Mix.
πŸ“ Abstract
We study learnable logic gate networks that stack layers of 2-input Boolean gates to build combinational circuits. Every 2-input gate has a unique multilinear polynomial with 4 coefficients, so the 16 Boolean gates form a codebook of prototypes in a 4-dimensional space, reducing training to a vector-quantization problem. The baseline method, Soft-Mix, learns a 16-dimensional softmax over gate identities, but the codebook has rank~4: 11 of 15 simplex directions carry nullspace gradient, and at uniform initialization the backward signal vanishes exactly. We prove that no affine product reparameterization fixes the resulting interaction-coefficient starvation under STE, and show that the covariance Jacobian of soft-VQ selection bypasses it by coupling the starved coefficient to the always-active constant channel. Working in the 4-dimensional polynomial space reduces each neuron from 16 to 4 parameters. On seven datasets, at least one 4-parameter method matches or exceeds Soft-Mix on every dataset; the CovJac advantage over STE grows monotonically with interaction demand across all seven datasets. At depth, Soft-Mix collapses ($-37.3$pp on CIFAR-10 at 12 layers) while CovJac holds ($-0.5$pp on CIFAR-10, stable on MNIST).
Problem

Research questions and friction points this paper is trying to address.

logic gate networks
gradient vanishing
vector quantization
multilinear polynomials
straight-through estimator
Innovation

Methods, ideas, or system contributions that make the work stand out.

multilinear polynomials
logic gate networks
vector quantization
covariance Jacobian
gradient starvation