π€ AI Summary
This work addresses the challenge of scaling differentiable logic gate networks beyond binary logic to Kleeneβs three-valued logic, where the combinatorial explosion of 19,683 possible gates renders conventional softmax-based training infeasible. To overcome this, the authors propose Polynomial Surrogate Training (PST), which models each ternary neuron as a (2,2)-degree polynomial with only nine learnable coefficients, drastically reducing parameter count while ensuring a bounded and vanishing discrepancy between the continuous network and its discrete logic counterpart upon convergence. This approach enables, for the first time, efficient differentiable training of true ternary logic gates, naturally incorporating an UNKNOWN state that facilitates principled abstention under uncertainty. Experiments demonstrate 2β3Γ faster training on CIFAR-10 compared to binary networks, discovery of functionally rich ternary gates, and significantly improved selective prediction accuracy on synthetic and tabular tasks, where the UNKNOWN output serves as a Bayes-optimal proxy for uncertainty.
π Abstract
Differentiable logic gate networks (DLGNs) learn compact, interpretable Boolean circuits via gradient-based training, but all existing variants are restricted to the 16 two-input binary gates. Extending DLGNs to Ternary Kleene $K_3$ logic and training DTLGNs where the UNKNOWN state enables principled abstention under uncertainty is desirable. However, the support set of potential gates per neuron explodes to $19{,}683$, making the established softmax-over-gates training approach intractable. We introduce Polynomial Surrogate Training (PST), which represents each ternary neuron as a degree-$(2,2)$ polynomial with 9 learnable coefficients (a $2{,}187\times$ parameter reduction) and prove that the gap between the trained network and its discretized logic circuit is bounded by a data-independent commitment loss that vanishes at convergence. Scaling experiments from 48K to 512K neurons on CIFAR-10 demonstrate that this hardening gap contracts with overparameterization. Ternary networks train $2$-$3\times$ faster than binary DLGNs and discover true ternary gates that are functionally diverse. On synthetic and tabular tasks we find that the UNKNOWN output acts as a Bayes-optimal uncertainty proxy, enabling selective prediction in which ternary circuits surpass binary accuracy once low-confidence predictions are filtered. More broadly, PST establishes a general polynomial-surrogate methodology whose parameterization cost grows only quadratically with logic valence, opening the door to many-valued differentiable logic.