Nondeterminism-Aware Optimistic Verification for Floating-Point Neural Networks

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Verifying floating-point neural network outputs on untrusted hardware (e.g., cloud GPUs) is challenging due to computation nondeterminism. Method: We propose the first trustless optimistic verification protocol requiring no trusted hardware. It jointly leverages theoretical error bounds and cross-hardware empirical error distributions to construct operator-level tolerance regions. We design a Merkle-tree–based threshold-guided dispute game, augmented with lightweight voting arbitration, enabling fine-grained, low-overhead trustless verification. Contribution/Results: Experiments on CNNs, Transformers, and diffusion models show that our empirically derived tolerances are 100–1,000× tighter than worst-case IEEE-754 bounds. The protocol achieves zero success rate against adversarial attacks and incurs only 0.3% end-to-end verification overhead. We have integrated it into a PyTorch-compatible runtime and deployed an Ethereum Holesky smart contract, fully validated on testnet.

Technology Category

Application Category

📝 Abstract

Neural networks increasingly run on hardware outside the user's control (cloud GPUs, inference marketplaces). Yet ML-as-a-Service reveals little about what actually ran or whether returned outputs faithfully reflect the intended inputs. Users lack recourse against service downgrades (model swaps, quantization, graph rewrites, or discrepancies like altered ad embeddings). Verifying outputs is hard because floating-point(FP) execution on heterogeneous accelerators is inherently nondeterministic. Existing approaches are either impractical for real FP neural networks or reintroduce vendor trust. We present NAO: a Nondeterministic tolerance Aware Optimistic verification protocol that accepts outputs within principled operator-level acceptance regions rather than requiring bitwise equality. NAO combines two error models: (i) sound per-operator IEEE-754 worst-case bounds and (ii) tight empirical percentile profiles calibrated across hardware. Discrepancies trigger a Merkle-anchored, threshold-guided dispute game that recursively partitions the computation graph until one operator remains, where adjudication reduces to a lightweight theoretical-bound check or a small honest-majority vote against empirical thresholds. Unchallenged results finalize after a challenge window, without requiring trusted hardware or deterministic kernels. We implement NAO as a PyTorch-compatible runtime and a contract layer currently deployed on Ethereum Holesky testnet. The runtime instruments graphs, computes per-operator bounds, and runs unmodified vendor kernels in FP32 with negligible overhead (0.3% on Qwen3-8B). Across CNNs, Transformers and diffusion models on A100, H100, RTX6000, RTX4090, empirical thresholds are $10^2-10^3$ times tighter than theoretical bounds, and bound-aware adversarial attacks achieve 0% success. NAO reconciles scalability with verifiability for real-world heterogeneous ML compute.

Problem

Research questions and friction points this paper is trying to address.

Verifying neural network outputs on untrusted hardware with floating-point nondeterminism

Establishing tolerance bounds for operator-level discrepancies across heterogeneous accelerators

Enabling scalable verification without trusted hardware or deterministic kernels

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses operator-level acceptance regions for verification

Combines worst-case bounds with empirical percentile profiles

Employs Merkle-anchored dispute game for discrepancies

🔎 Similar Papers

No similar papers found.