π€ AI Summary
This work addresses the challenges of training deep learnable logic gate networks, which suffer from parameter redundancy and vanishing gradients that hinder effective learning of interaction coefficients. The authors model two-input Boolean gates as four-dimensional multilinear polynomials, reformulating parameter learning as a low-dimensional vector quantization problem. They introduce a covariance-Jacobian (CovJac)-based soft vector quantization method that circumvents coefficient starvation caused by straight-through estimators, enabling stable optimization in deep architectures. By integrating multilinear encoding, soft quantization, and a CovJac-driven gate selection mechanism, the proposed approach matches or surpasses the Soft-Mix baseline using only four parameters across seven datasets. Notably, in deep settings, it demonstrates substantially improved stabilityβe.g., on CIFAR-10, performance drops by merely 0.5 percentage points compared to a drastic 37.3-point decline with Soft-Mix.
π Abstract
We study learnable logic gate networks that stack layers of 2-input Boolean gates to build combinational circuits. Every 2-input gate has a unique multilinear polynomial with 4 coefficients, so the 16 Boolean gates form a codebook of prototypes in a 4-dimensional space, reducing training to a vector-quantization problem. The baseline method, Soft-Mix, learns a 16-dimensional softmax over gate identities, but the codebook has rank~4: 11 of 15 simplex directions carry nullspace gradient, and at uniform initialization the backward signal vanishes exactly. We prove that no affine product reparameterization fixes the resulting interaction-coefficient starvation under STE, and show that the covariance Jacobian of soft-VQ selection bypasses it by coupling the starved coefficient to the always-active constant channel. Working in the 4-dimensional polynomial space reduces each neuron from 16 to 4 parameters. On seven datasets, at least one 4-parameter method matches or exceeds Soft-Mix on every dataset; the CovJac advantage over STE grows monotonically with interaction demand across all seven datasets. At depth, Soft-Mix collapses ($-37.3$pp on CIFAR-10 at 12 layers) while CovJac holds ($-0.5$pp on CIFAR-10, stable on MNIST).