Models That Prove Their Own Correctness

📅 2024-05-24

🏛️ Electron. Colloquium Comput. Complex.

📈 Citations: 5

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the lack of input-level correctness guarantees in machine learning models. We propose *Self-Proving* models, which—alongside predictions—generate interactive, formal proofs verifiable by a soundness-guaranteed verifier ( V ). This is the first integration of interactive proof mechanisms into end-to-end model training to achieve input-level verifiability. Our approach unifies theoretical convergence analysis, a custom-designed GCD computation protocol, and a Transformer-based architecture, enabling high-probability self-proof generation and exhaustive input-level error detection. Experiments demonstrate that the trained models not only compute the greatest common divisor (GCD) of two integers with high accuracy but also produce corresponding machine-checkable correctness proofs. This constitutes the first empirical validation that learned models can provide formally verifiable, input-level correctness guarantees—establishing both feasibility and effectiveness.

Technology Category

Application Category

📝 Abstract

How can we trust the correctness of a learned model on a particular input of interest? Model accuracy is typically measured *on average* over a distribution of inputs, giving no guarantee for any fixed input. This paper proposes a theoretically-founded solution to this problem: to train *Self-Proving models* that prove the correctness of their output to a verification algorithm V via an Interactive Proof. Self-Proving models satisfy that, with high probability over a random input, the model generates a correct output *and* successfully proves its correctness to V. The *soundness* property of V guarantees that, for *every* input, no model can convince V of the correctness of an incorrect output. Thus, a Self-Proving model proves correctness of most of its outputs, while *all* incorrect outputs (of any model) are detected by V. We devise a generic method for learning Self-Proving models, and we prove convergence bounds under certain assumptions. The theoretical framework and results are complemented by experiments on an arithmetic capability: computing the greatest common divisor (GCD) of two integers. Our learning method is used to train a Self-Proving transformer that computes the GCD *and* proves the correctness of its answer.

Problem

Research questions and friction points this paper is trying to address.

Ensuring model correctness on specific inputs

Training models to prove output correctness

Detecting incorrect outputs via verification algorithms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Proving models generate correctness proofs interactively

Verification algorithm detects all incorrect outputs with soundness

Training methods include Transcript Learning and Reinforcement Learning

🔎 Similar Papers

No similar papers found.