VisualLens: Personalization through Visual History

📅 2024-11-25

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Existing recommender systems heavily rely on interaction logs or textual signals, limiting generalizability to multimodal scenarios and leaving users’ visual histories—such as casually captured personal images—largely underexploited. This work proposes the first recommendation framework leveraging unlabeled, noisy, and highly diverse user visual histories. Methodologically, it innovatively integrates multi-scale visual feature extraction, interest-aware image filtering, and cross-modal representation alignment to automatically discover task-agnostic, fine-grained preferences from raw visual sequences. Evaluated on two newly constructed multimodal recommendation benchmarks, our approach achieves a 5–10% improvement in Hit@3 over state-of-the-art methods and outperforms GPT-4o by 2–5%, demonstrating that visual history serves as an effective and scalable carrier of universal user preferences.

Technology Category

Application Category

📝 Abstract

We hypothesize that a user's visual history with images reflecting their daily life, offers valuable insights into their interests and preferences, and can be leveraged for personalization. Among the many challenges to achieve this goal, the foremost is the diversity and noises in the visual history, containing images not necessarily related to a recommendation task, not necessarily reflecting the user's interest, or even not necessarily preference-relevant. Existing recommendation systems either rely on task-specific user interaction logs, such as online shopping history for shopping recommendations, or focus on text signals. We propose a novel approach, VisualLens, that extracts, filters, and refines image representations, and leverages these signals for personalization. We created two new benchmarks with task-agnostic visual histories, and show that our method improves over state-of-the-art recommendations by 5-10% on Hit@3, and improves over GPT-4o by 2-5%. Our approach paves the way for personalized recommendations in scenarios where traditional methods fail.

Problem

Research questions and friction points this paper is trying to address.

Leveraging visual history for task-agnostic personalization in recommendations

Overcoming limitations of item-based and text-only recommendation systems

Extracting user profiles from daily life images for multimodal recommendations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages MLLMs for task-agnostic visual history personalization

Extracts, filters, and refines user profiles from visual data

Outperforms item-based and GPT-4o recommendations by 2-10%

🔎 Similar Papers

No similar papers found.