Purely Semantic Indexing for LLM-based Generative Recommendation and Retrieval

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Semantic ID conflicts—where semantically similar items are assigned identical identifiers—plague existing semantic ID methods in LLM-driven generative recommendation and retrieval. To avoid conventional non-semantic suffixing (which introduces randomness and inflates the search space), this paper proposes a purely semantic indexing mechanism. It constructs a centroid space via semantic clustering, introduces a relaxed nearest-centroid selection strategy, and integrates two model-agnostic algorithms: exhaustive candidate matching (ECM) and recursive residual search (RRS). Together, they ensure strict ID uniqueness while preserving semantic consistency. Experiments across sequential recommendation, product search, and document retrieval demonstrate significant improvements in overall performance, particularly enhancing accuracy and retrieval efficiency under cold-start conditions.

Technology Category

Application Category

📝 Abstract
Semantic identifiers (IDs) have proven effective in adapting large language models for generative recommendation and retrieval. However, existing methods often suffer from semantic ID conflicts, where semantically similar documents (or items) are assigned identical IDs. A common strategy to avoid conflicts is to append a non-semantic token to distinguish them, which introduces randomness and expands the search space, therefore hurting performance. In this paper, we propose purely semantic indexing to generate unique, semantic-preserving IDs without appending non-semantic tokens. We enable unique ID assignment by relaxing the strict nearest-centroid selection and introduce two model-agnostic algorithms: exhaustive candidate matching (ECM) and recursive residual searching (RRS). Extensive experiments on sequential recommendation, product search, and document retrieval tasks demonstrate that our methods improve both overall and cold-start performance, highlighting the effectiveness of ensuring ID uniqueness.
Problem

Research questions and friction points this paper is trying to address.

Resolving semantic ID conflicts in LLM-based recommendation systems
Eliminating non-semantic token appending that introduces randomness
Generating unique semantic-preserving IDs without performance degradation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates unique semantic IDs without non-semantic tokens
Relaxes strict nearest-centroid selection for assignment
Introduces ECM and RRS model-agnostic algorithms
🔎 Similar Papers
No similar papers found.