TAMEing Long Contexts in Personalization: Towards Training-Free and State-Aware MLLM Personalized Assistant

📅 2025-12-25

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing personalized multimodal large language models (MLLMs) are limited to short contexts and static concept substitution, lacking the capacity for sustained, evolution-aware personalization in long dialogues. To address this, we introduce LCMP—the first long-context personalized evaluation benchmark—and propose TAME, a training-free framework. TAME features a novel dual-memory mechanism: one memory captures the temporal evolution of personalized concepts, while the other encodes their long-term stable attributes. Further, we propose the Retrieve-then-Align Augmented Generation (RA2G) paradigm, enabling dynamic alignment of heterogeneous memory sources with the current query and subsequent generation. Experiments on LCMP demonstrate that TAME significantly improves response accuracy, cross-turn consistency, and adaptability to concept evolution. Our work establishes a new paradigm for personalized MLLMs in extended conversational settings and provides a reproducible, standardized evaluation foundation.

Technology Category

Application Category

📝 Abstract

Multimodal Large Language Model (MLLM) Personalization is a critical research problem that facilitates personalized dialogues with MLLMs targeting specific entities (known as personalized concepts). However, existing methods and benchmarks focus on the simple, context-agnostic visual identification and textual replacement of the personalized concept (e.g., "A yellow puppy" -> "Your puppy Mochi"), overlooking the ability to support long-context conversations. An ideal personalized MLLM assistant is capable of engaging in long-context dialogues with humans and continually improving its experience quality by learning from past dialogue histories. To bridge this gap, we propose LCMP, the first Long-Context MLLM Personalization evaluation benchmark. LCMP assesses the capability of MLLMs in perceiving variations of personalized concepts and generating contextually appropriate personalized responses that reflect these variations. As a strong baseline for LCMP, we introduce a novel training-free and state-aware framework TAME. TAME endows MLLMs with double memories to manage the temporal and persistent variations of each personalized concept in a differentiated manner. In addition, TAME incorporates a new training-free Retrieve-then-Align Augmented Generation (RA2G) paradigm. RA2G introduces an alignment step to extract the contextually fitted information from the multi-memory retrieved knowledge to the current questions, enabling better interactions for complex real-world user queries. Experiments on LCMP demonstrate that TAME achieves the best performance, showcasing remarkable and evolving interaction experiences in long-context scenarios.

Problem

Research questions and friction points this paper is trying to address.

Develops a benchmark for evaluating long-context multimodal personalization capabilities.

Proposes a training-free framework to manage personalized concept variations in dialogues.

Enables context-aware responses by retrieving and aligning knowledge from conversation history.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free double memory management

Retrieve-then-Align Augmented Generation paradigm

State-aware framework for long-context personalization

🔎 Similar Papers

No similar papers found.