🤖 AI Summary
Existing Prover-Verifier Games (PVGs) struggle with high-dimensional inputs (e.g., images), while Concept Bottleneck Models (CBMs) rely on low-capacity linear predictors, limiting both interpretability and formal verifiability for nonlinear classification.
Method: We propose Neural Concept Verifier (NCV), the first framework to tightly integrate PVG mechanisms with concept bottleneck modeling. NCV employs weakly supervised concept discovery to extract structured semantic concepts; a prover selects salient concepts, and a verifier performs nonlinear, concept-level classification and formal verification.
Contribution/Results: NCV effectively mitigates shortcut learning, achieving superior classification accuracy and verifiability over conventional CBMs and pixel-level PVGs across multiple high-dimensional benchmarks. It advances both predictive performance and rigorous, concept-grounded verification for complex visual data.
📝 Abstract
While Prover-Verifier Games (PVGs) offer a promising path toward verifiability in nonlinear classification models, they have not yet been applied to complex inputs such as high-dimensional images. Conversely, Concept Bottleneck Models (CBMs) effectively translate such data into interpretable concepts but are limited by their reliance on low-capacity linear predictors. In this work, we introduce the Neural Concept Verifier (NCV), a unified framework combining PVGs with concept encodings for interpretable, nonlinear classification in high-dimensional settings. NCV achieves this by utilizing recent minimally supervised concept discovery models to extract structured concept encodings from raw inputs. A prover then selects a subset of these encodings, which a verifier -- implemented as a nonlinear predictor -- uses exclusively for decision-making. Our evaluations show that NCV outperforms CBM and pixel-based PVG classifier baselines on high-dimensional, logically complex datasets and also helps mitigate shortcut behavior. Overall, we demonstrate NCV as a promising step toward performative, verifiable AI.