VisualLens: Personalization through Visual History

📅 2024-11-25
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing recommender systems heavily rely on interaction logs or textual signals, limiting generalizability to multimodal scenarios and leaving users’ visual histories—such as casually captured personal images—largely underexploited. This work proposes the first recommendation framework leveraging unlabeled, noisy, and highly diverse user visual histories. Methodologically, it innovatively integrates multi-scale visual feature extraction, interest-aware image filtering, and cross-modal representation alignment to automatically discover task-agnostic, fine-grained preferences from raw visual sequences. Evaluated on two newly constructed multimodal recommendation benchmarks, our approach achieves a 5–10% improvement in Hit@3 over state-of-the-art methods and outperforms GPT-4o by 2–5%, demonstrating that visual history serves as an effective and scalable carrier of universal user preferences.

Technology Category

Application Category

📝 Abstract
We hypothesize that a user's visual history with images reflecting their daily life, offers valuable insights into their interests and preferences, and can be leveraged for personalization. Among the many challenges to achieve this goal, the foremost is the diversity and noises in the visual history, containing images not necessarily related to a recommendation task, not necessarily reflecting the user's interest, or even not necessarily preference-relevant. Existing recommendation systems either rely on task-specific user interaction logs, such as online shopping history for shopping recommendations, or focus on text signals. We propose a novel approach, VisualLens, that extracts, filters, and refines image representations, and leverages these signals for personalization. We created two new benchmarks with task-agnostic visual histories, and show that our method improves over state-of-the-art recommendations by 5-10% on Hit@3, and improves over GPT-4o by 2-5%. Our approach paves the way for personalized recommendations in scenarios where traditional methods fail.
Problem

Research questions and friction points this paper is trying to address.

Leveraging visual history for task-agnostic personalization in recommendations
Overcoming limitations of item-based and text-only recommendation systems
Extracting user profiles from daily life images for multimodal recommendations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages MLLMs for task-agnostic visual history personalization
Extracts, filters, and refines user profiles from visual data
Outperforms item-based and GPT-4o recommendations by 2-10%
🔎 Similar Papers
No similar papers found.