BBQRec: Behavior-Bind Quantization for Multi-Modal Sequential Recommendation

๐Ÿ“… 2025-04-09
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing multimodal sequential recommendation methods suffer from two critical limitations when discretizing modalities into semantic IDs: (1) fragmented quantization decouples modality mapping from behavioral objectives, and (2) excessive reliance on discrete IDs undermines cross-modal semantic consistency, thereby weakening user preference modeling. To address these issues, we propose a behavior-anchored dual-alignment quantization frameworkโ€”the first to jointly align behavioral and semantic spaces in contrastive codebook learning. We further design a discretization similarity reweighting mechanism grounded in quantized semantic relations, balancing modality synergy and model compatibility. Extensive experiments on four real-world benchmarks demonstrate significant improvements over state-of-the-art methods, validating that behavior-driven discretization enhances both recommendation accuracy and generalizability.

Technology Category

Application Category

๐Ÿ“ Abstract
Multi-modal sequential recommendation systems leverage auxiliary signals (e.g., text, images) to alleviate data sparsity in user-item interactions. While recent methods exploit large language models to encode modalities into discrete semantic IDs for autoregressive prediction, we identify two critical limitations: (1) Existing approaches adopt fragmented quantization, where modalities are independently mapped to semantic spaces misaligned with behavioral objectives, and (2) Over-reliance on semantic IDs disrupts inter-modal semantic coherence, thereby weakening the expressive power of multi-modal representations for modeling diverse user preferences. To address these challenges, we propose a Behavior-Bind multi-modal Quantization for Sequential Recommendation (BBQRec for short) featuring dual-aligned quantization and semantics-aware sequence modeling. First, our behavior-semantic alignment module disentangles modality-agnostic behavioral patterns from noisy modality-specific features through contrastive codebook learning, ensuring semantic IDs are inherently tied to recommendation tasks. Second, we design a discretized similarity reweighting mechanism that dynamically adjusts self-attention scores using quantized semantic relationships, preserving multi-modal synergies while avoiding invasive modifications to the sequence modeling architecture. Extensive evaluations across four real-world benchmarks demonstrate BBQRec's superiority over the state-of-the-art baselines.
Problem

Research questions and friction points this paper is trying to address.

Fragmented quantization misaligns modalities with behavioral objectives
Over-reliance on semantic IDs disrupts inter-modal coherence
Weak multi-modal representation limits diverse preference modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Behavior-semantic alignment for recommendation tasks
Contrastive codebook learning to disentangle patterns
Discretized similarity reweighting for self-attention adjustment
๐Ÿ”Ž Similar Papers
No similar papers found.