Personalization Toolkit: Training Free Personalization of Large Vision Language Models

📅 2025-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Personalizing large vision-language models (LVLMs) typically requires time-consuming test-time fine-tuning, hindering practical deployment. Method: This paper introduces the first training-free, zero-shot personalization paradigm for LVLMs. Our approach integrates pre-trained vision foundation model feature extraction, retrieval-augmented generation (RAG), and lightweight visual prompting to construct a model-agnostic, plug-and-play personalization toolkit. Crucially, it requires no parameter updates—enabling immediate, user-specific object recognition and customized response generation. Contribution/Results: Evaluated across multiple personalized vision-language tasks, our method significantly outperforms state-of-the-art fine-tuning baselines. It establishes, for the first time, a new benchmark for zero-training personalization of LVLMs, offering an efficient, scalable, and deployment-ready solution for real-world applications.

Technology Category

Application Category

📝 Abstract
Large Vision Language Models (LVLMs) have significant potential to deliver personalized assistance by adapting to individual users' unique needs and preferences. Personalization of LVLMs is an emerging area that involves customizing models to recognize specific object instances and provide tailored responses. However, existing approaches rely on time-consuming test-time training for each user and object, rendering them impractical. This paper proposes a novel, training-free approach to LVLM personalization by leveraging pre-trained vision foundation models to extract distinct features, retrieval-augmented generation (RAG) techniques to recognize instances in the visual input, and visual prompting methods. Our model-agnostic vision toolkit enables flexible and efficient personalization without extensive retraining. We demonstrate state-of-the-art results, outperforming conventional training-based approaches and establish a new standard for LVLM personalization.
Problem

Research questions and friction points this paper is trying to address.

Personalization of LVLMs without retraining
Efficient recognition of specific object instances
Flexible personalization using pre-trained models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free LVLM personalization technique
Uses pre-trained vision foundation models
Incorporates retrieval-augmented generation methods
🔎 Similar Papers
No similar papers found.
Soroush Seifi
Soroush Seifi
Computer Vision Researcher at Toyota Motor Europe
Computer VisionMachine LearningArtifical Intelligence
V
Vaggelis Dorovatas
Toyota Motor Europe, Hoge Wei 33B, 1930, Zaventem, Belgium
D
Daniel Olmeda Reino
Toyota Motor Europe, Hoge Wei 33B, 1930, Zaventem, Belgium
Rahaf Aljundi
Rahaf Aljundi
Senior Researcher at Toyota Motor Europe
Machine learningComputer vision