IVF-TQ: Streaming-Robust Approximate Nearest Neighbor Search via a Codebook-Free Residual Layer

📅 2026-05-17

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the performance degradation of codebook-based approximate nearest neighbor (ANN) indexes under distribution shifts in streaming data by proposing a novel IVF index, IVF-TQ. Building upon coarse partitioning without requiring full retraining, IVF-TQ introduces a codebook-free residual quantization layer that combines fixed random rotation with precomputed Lloyd-Max scalar quantization to avoid frequent retraining. The key contributions include the first empirical validation of codebook-free IVF architecture stability at million-scale streaming settings, a theoretical upper bound on inner product approximation error for TurboQuant residual quantization with fixed rotation, and an adaptive mechanism that refreshes only partitions to mitigate worst-case rotational drift. Experiments show that on the Deep-10M streaming dataset, IVF-TQ suffers only a 0.8% recall drop—significantly outperforming IVF-PQ (3.23% drop)—and demonstrates superior robustness even on the static SIFT-1M benchmark.

📝 Abstract

We propose IVF-TQ, an IVF index with a codebook-free residual layer: a fixed random rotation followed by precomputed Lloyd-Max scalar quantization depending only on (b, d). Only the IVF coarse partition is trained. Building on TurboQuant (Zandieh et al., 2025), the design substantially reduces a key failure mode of trained-codebook ANN indexes (PQ, OPQ, ScaNN): staleness under streaming ingestion.Empirical (3 seeds): Per-batch PQ retraining does not recover the streaming gap at any tested bit budget (paired-t p > 0.28 everywhere). On streaming Deep-10M, IVF-TQ holds at 87.4% -> 86.6% (Delta = -0.80 +/- 0.10pp) while IVF-PQ degrades -3.23pp. A shuffled-i.i.d. control on SIFT-1M shows IVF-PQ losing -3.9pp without distribution shift. At higher PQ bit budgets (~1.5x IVF-TQ memory), absolute recall favors PQ as expected from rate-distortion (+6.1pp Deep-10M; +2.0pp SIFT-10M); the durable IVF-TQ benefit is operational (no codebook to retrain), robust across memory regimes.Prior art: IVF around a codebook-free residual quantizer is architecturally not new -- IVF-RaBitQ ships in Milvus, cuVS, LanceDB, Weaviate; Shi et al. (2026) is concurrent GPU work. TurboQuant itself tests only flat-rotation ANN.Contributions: (i) A multi-seed streaming-operational story for codebook-free IVF: 10M-scale evidence across PQ memory budgets. (ii) A uniform-over-sphere IP-error bound for the TQ residual quantizer with one fixed rotation (proof sketch in v1; rigorous in v2). (iii) Adaptive IVF-TQ: a partition-only refresh recovering 67% -> 97.8% under worst-case rotation shift with re-ranking (90.3% without).Code, data: https://github.com/tarun-ks/turboquant_search

Problem

Research questions and friction points this paper is trying to address.

streaming ingestion

approximate nearest neighbor search

codebook staleness

IVF index

residual quantization

Innovation

Methods, ideas, or system contributions that make the work stand out.

codebook-free quantization

streaming-robust ANN

IVF-TQ