exttt{WebANNS}: Fast and Efficient Approximate Nearest Neighbor Search in Web Browsers

📅 2025-07-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the core challenges of approximate nearest neighbor search (ANNS) in browser environments—including severe computational constraints, inaccessibility of external storage, and acute memory pressure—this paper proposes the first Web-optimized lightweight ANNS engine. Methodologically, it innovatively integrates WebAssembly for accelerating compute-intensive operations, designs a fine-grained lazy-loading mechanism to avoid full-dataset materialization, and introduces a heuristic memory compression strategy. Experimental results demonstrate that, while preserving retrieval accuracy, the engine reduces the 99th-percentile query latency by up to 743.8× over state-of-the-art (SOTA) approaches (from 10 seconds to 10 milliseconds) and cuts memory footprint by 39%. This work marks the first realization of millisecond-scale, low-memory-overhead ANNS services entirely within the browser.

Technology Category

Application Category

📝 Abstract
Approximate nearest neighbor search (ANNS) has become vital to modern AI infrastructure, particularly in retrieval-augmented generation (RAG) applications. Numerous in-browser ANNS engines have emerged to seamlessly integrate with popular LLM-based web applications, while addressing privacy protection and challenges of heterogeneous device deployments. However, web browsers present unique challenges for ANNS, including computational limitations, external storage access issues, and memory utilization constraints, which state-of-the-art (SOTA) solutions fail to address comprehensively. We propose exttt{WebANNS}, a novel ANNS engine specifically designed for web browsers. exttt{WebANNS} leverages WebAssembly to overcome computational bottlenecks, designs a lazy loading strategy to optimize data retrieval from external storage, and applies a heuristic approach to reduce memory usage. Experiments show that exttt{WebANNS} is fast and memory efficient, achieving up to $743.8 imes$ improvement in 99th percentile query latency over the SOTA engine, while reducing memory usage by up to 39%. Note that exttt{WebANNS} decreases query time from 10 seconds to the 10-millisecond range in browsers, making in-browser ANNS practical with user-acceptable latency.
Problem

Research questions and friction points this paper is trying to address.

Address computational limitations in web browser ANNS
Optimize external storage access for in-browser ANNS
Reduce memory usage in web-based ANNS engines
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses WebAssembly to overcome computational bottlenecks
Implements lazy loading for optimized external storage access
Applies heuristic approach to reduce memory usage
Mugeng Liu
Mugeng Liu
Peking University
WebAssemblyAI for SEAI for System
S
Siqi Zhong
Fudan University, Shanghai, China
Q
Qi Yang
Institute for Artificial Intelligence, Peking University, Beijing, China
Y
Yudong Han
Institute for Artificial Intelligence, Peking University, Beijing, China
Xuanzhe Liu
Xuanzhe Liu
Boya Distinguished Professor, Peking University, ACM Distinguished Scientist
Machine Learning SystemMobile Computing SystemServerless Computing
Yun Ma
Yun Ma
Assistant Professor, Peking University
WebMobile ComputingSoftware EngineeringService