🤖 AI Summary
Traditional vector retrieval supports only a single query vector, limiting its effectiveness in complex reasoning and multi-example retrieval scenarios. This work proposes a novel multi-query vector retrieval method that introduces anomaly pattern detection into the task for the first time. By analyzing the consistency of anomalies across dimensions among multiple query vectors, the approach dynamically identifies discriminative dimensions and retrieves items from the vector database that exhibit similar anomalous patterns along those dimensions. Integrating multi-query embeddings, high-dimensional anomaly detection, and similarity analysis, the method achieves significant performance gains across image, text, and tabular datasets. Notably, retrieval effectiveness consistently improves as the number of query examples increases from one to eight.
📝 Abstract
A classical vector retrieval problem typically considers a \emph{single} query embedding vector as input and retrieves the most similar embedding vectors from a vector database. However, complex reasoning and retrieval tasks frequently require \emph{multiple query vectors}, rather than a single one. In this work, we propose a retrieval method that considers multiple query vectors simultaneously and retrieves the most relevant vectors from the database using concepts from anomalous pattern detection. Specifically, our approach leverages a set of query vectors $Q$ (with $|Q|\geq 1$), and identifies the subset of vector dimensions within $Q$ that standout (anomalous) from the rest of dimensions. Next, we scan the vector database to retrieve the set of vectors that are also anomalous across the previously identified vector dimensions and return them as our retrieved set of vectors. We validate our approach on two image datasets, a text dataset, and a tabular dataset. Overall, we observe that, across most datasets, larger query sets lead to improved retrieval performance. The improvement is most pronounced when increasing the query sets from 1 to 8, while the gains become smaller beyond that.