🤖 AI Summary
To address the lack of transparency and interpretability in deep learning models—often termed “black-box” systems—this paper proposes Neural Probabilistic Circuits (NPCs), an end-to-end trainable, intrinsically interpretable neuro-symbolic architecture. NPCs integrate attribute recognition networks with structured probabilistic circuits, enabling logic-driven, compositional, and traceable predictions. Theoretically, NPC prediction error is provably decomposable into a linear sum of individual module errors; moreover, NPCs natively support both most-probable and counterfactual explanations. A three-stage joint training algorithm is introduced. Empirically, NPCs achieve predictive accuracy on par with state-of-the-art black-box models across four benchmark datasets, while simultaneously generating explanations that are precise, faithful, and human-understandable. This advances the practicality and deployability of trustworthy AI systems.
📝 Abstract
End-to-end deep neural networks have achieved remarkable success across various domains but are often criticized for their lack of interpretability. While post hoc explanation methods attempt to address this issue, they often fail to accurately represent these black-box models, resulting in misleading or incomplete explanations. To overcome these challenges, we propose an inherently transparent model architecture called Neural Probabilistic Circuits (NPCs), which enable compositional and interpretable predictions through logical reasoning. In particular, an NPC consists of two modules: an attribute recognition model, which predicts probabilities for various attributes, and a task predictor built on a probabilistic circuit, which enables logical reasoning over recognized attributes to make class predictions. To train NPCs, we introduce a three-stage training algorithm comprising attribute recognition, circuit construction, and joint optimization. Moreover, we theoretically demonstrate that an NPC's error is upper-bounded by a linear combination of the errors from its modules. To further demonstrate the interpretability of NPC, we provide both the most probable explanations and the counterfactual explanations. Empirical results on four benchmark datasets show that NPCs strike a balance between interpretability and performance, achieving results competitive even with those of end-to-end black-box models while providing enhanced interpretability.