🤖 AI Summary
This work addresses the issue of positional collapse in multimodal recommendation systems, where enforcing a unified embedding space often obscures modality-specific structures and leads to ID dominance. To mitigate this, the authors propose AnchorRec, a framework that decouples cross-modal alignment from representation learning by introducing an anchor-guided indirect alignment mechanism within a lightweight projection space. This approach preserves the intrinsic structure of each modality while maintaining cross-modal consistency, thereby avoiding the information loss typically incurred by direct alignment. Extensive experiments on four Amazon datasets demonstrate that AnchorRec achieves competitive Top-N recommendation performance. Qualitative analyses further confirm that its learned representations exhibit enhanced distinctiveness and coherence across modalities.
📝 Abstract
Multimodal recommender systems (MMRS) leverage images, text, and interaction signals to enrich item representations. However, recent alignment based MMRSs that enforce a unified embedding space often blur modality specific structures and exacerbate ID dominance. Therefore, we propose AnchorRec, a multimodal recommendation framework that performs indirect, anchor based alignment in a lightweight projection domain. By decoupling alignment from representation learning, AnchorRec preserves each modality's native structure while maintaining cross modal consistency and avoiding positional collapse. Experiments on four Amazon datasets show that AnchorRec achieves competitive top N recommendation accuracy, while qualitative analyses demonstrate improved multimodal expressiveness and coherence. The codebase of AnchorRec is available at https://github.com/hun9008/AnchorRec.