๐ค AI Summary
To address the inefficiency and poor scalability of vector retrieval caused by the high dimensionality of Transformer embeddings, this paper proposes the first game-theoretic latent-space compression framework. It formulates compression as a zero-sum game between semantic preservation (similarity fidelity) and dimensionality reduction (compression ratio), achieving Pareto-optimal trade-offs via adversarial optimization. The method jointly integrates Transformer embeddings, differentiable quantization, and game equilibrium solving, enabling end-to-end training and seamless compatibility with industrial indexing libraries such as FAISS. Evaluated on standard benchmarks, our approach achieves an average similarity score of 0.9981 and retrieval utility of 0.8873โsubstantially outperforming FAISS (0.5517 and 0.5194, respectively). Moreover, it integrates transparently into large-model retrieval pipelines without architectural modification.
๐ Abstract
Vector similarity search plays a pivotal role in modern information retrieval systems, especially when powered by transformer-based embeddings. However, the scalability and efficiency of such systems are often hindered by the high dimensionality of latent representations. In this paper, we propose a novel game-theoretic framework for optimizing latent-space compression to enhance both the efficiency and semantic utility of vector search. By modeling the compression strategy as a zero-sum game between retrieval accuracy and storage efficiency, we derive a latent transformation that preserves semantic similarity while reducing redundancy. We benchmark our method against FAISS, a widely-used vector search library, and demonstrate that our approach achieves a significantly higher average similarity (0.9981 vs. 0.5517) and utility (0.8873 vs. 0.5194), albeit with a modest increase in query time. This trade-off highlights the practical value of game-theoretic latent compression in high-utility, transformer-based search applications. The proposed system can be seamlessly integrated into existing LLM pipelines to yield more semantically accurate and computationally efficient retrieval.