🤖 AI Summary
Neural Disjunctive Normal Form (DNF) models often suffer from performance degradation and reduced interpretability after symbolic translation, primarily due to entanglement between thresholding operations and weight parameters. To address this, we propose an unsupervised disentanglement method that explicitly decomposes nested logical rules into independent sub-nodes via node splitting, while jointly optimizing both thresholds and structural topology—without requiring domain-specific priors. The approach is inherently adaptable to binary classification, multiclass, and multilabel tasks, including predicate invention scenarios. It preserves high predictive accuracy of the trained neural model while substantially narrowing the performance gap before and after symbolic translation, yielding more compact and semantically transparent logical rules. Empirical evaluation across multiple benchmark datasets demonstrates that our method achieves a favorable trade-off between classification accuracy and human-understandable interpretability, establishing a novel paradigm for knowledge extraction in neural-symbolic learning.
📝 Abstract
Neural Disjunctive Normal Form (DNF) based models are powerful and interpretable approaches to neuro-symbolic learning and have shown promising results in classification and reinforcement learning settings without prior knowledge of the tasks. However, their performance is degraded by the thresholding of the post-training symbolic translation process. We show here that part of the performance degradation during translation is due to its failure to disentangle the learned knowledge represented in the form of the networks' weights. We address this issue by proposing a new disentanglement method; by splitting nodes that encode nested rules into smaller independent nodes, we are able to better preserve the models' performance. Through experiments on binary, multiclass, and multilabel classification tasks (including those requiring predicate invention), we demonstrate that our disentanglement method provides compact and interpretable logical representations for the neural DNF-based models, with performance closer to that of their pre-translation counterparts. Our code is available at https://github.com/kittykg/disentangling-ndnf-classification.