V3DB: Audit-on-Demand Zero-Knowledge Proofs for Verifiable Vector Search over Committed Snapshots

๐Ÿ“… 2026-03-03
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the lack of auditability in existing dense retrieval systems, where clients cannot verify the correctness of returned top-k results. The authors propose the first verifiable vector search service supporting on-demand auditing by formalizing the IVF-PQ retrieval process into a fixed five-step query semantics. Their approach combines cryptographic commitments with multiset equivalence and inclusion checks to efficiently generate succinct zero-knowledge proofsโ€”avoiding expensive in-circuit sorting and random memory accesses. Implemented atop Plonky2, the prototype maintains result correctness while achieving up to 22ร— speedup over a pure circuit baseline, reducing peak memory usage by 40%, and enabling verification latency in the millisecond range.

Technology Category

Application Category

๐Ÿ“ Abstract
Dense retrieval services increasingly underpin semantic search, recommendation, and retrieval-augmented generation, yet clients typically receive only a top-$k$ list with no auditable evidence of how it was produced. We present V3DB, a verifiable, versioned vector-search service that enables audit-on-demand correctness checks for approximate nearest-neighbour (ANN) retrieval executed by a potentially untrusted service provider. V3DB commits to each corpus snapshot and standardises an IVF-PQ search pipeline into a fixed-shape, five-step query semantics. Given a public snapshot commitment and a query embedding, the service returns the top-$k$ payloads and, when challenged, produces a succinct zero-knowledge proof that the output is exactly the result of executing the published semantics on the committed snapshot -- without revealing the embedding corpus or private index contents. To make proving practical, V3DB avoids costly in-circuit sorting and random access by combining multiset equality/inclusion checks with lightweight boundary conditions. Our prototype implementation based on Plonky2 achieves up to $22\times$ faster proving and up to $40\%$ lower peak memory consumption than the circuit-only baseline, with millisecond-level verification time. Github Repo at https://github.com/TabibitoQZP/zk-IVF-PQ.
Problem

Research questions and friction points this paper is trying to address.

verifiable vector search
zero-knowledge proofs
audit-on-demand
committed snapshots
approximate nearest-neighbour retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

zero-knowledge proof
verifiable vector search
committed snapshots
IVF-PQ
audit-on-demand
๐Ÿ”Ž Similar Papers
No similar papers found.