Improving Search Suggestions for Alphanumeric Queries

📅 2026-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional lexical- or embedding-based retrieval methods perform poorly on sparse, non-linguistic alphanumeric product identifiers (e.g., MPNs, SKUs) due to their sensitivity to tokenization and spelling variations. This work proposes a training-free, character-level retrieval framework that encodes identifiers into fixed-length binary vectors, enabling efficient similarity computation via Hamming distance and scalable retrieval over large corpora through approximate nearest neighbor search. An optional edit-distance-based reranking stage further enhances precision. By replacing complex dense models with an interpretable, learning-free representation, the approach significantly improves search suggestion quality while maintaining low latency. A/B testing demonstrates clear gains in key business metrics, confirming its effectiveness and practicality in production environments.
📝 Abstract
Alphanumeric identifiers such as manufacturer part numbers (MPNs), SKUs, and model codes are ubiquitous in e-commerce catalogs and search. These identifiers are sparse, non linguistic, and highly sensitive to tokenization and typographical variation, rendering conventional lexical and embedding based retrieval methods ineffective. We propose a training free, character level retrieval framework that encodes each alphanumeric sequence as a fixed length binary vector. This representation enables efficient similarity computation via Hamming distance and supports nearest neighbor retrieval over large identifier corpora. An optional re-ranking stage using edit distance refines precision while preserving latency guarantees. The method offers a practical and interpretable alternative to learned dense retrieval models, making it suitable for production deployment in search suggestion generation systems. Significant gains in business metrics in the A/B test further prove utility of our approach.
Problem

Research questions and friction points this paper is trying to address.

alphanumeric queries
search suggestions
e-commerce search
identifier retrieval
tokenization sensitivity
Innovation

Methods, ideas, or system contributions that make the work stand out.

character-level retrieval
binary vector encoding
Hamming distance
alphanumeric identifiers
training-free search
🔎 Similar Papers
No similar papers found.