Modality-Decoupled Online Recursive Editing

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

198K/year
🤖 AI Summary
This work addresses the challenge of cross-modal interference and persistent disruption in online editing of multimodal large language models, often caused by vision-dominant activation. To mitigate this, the authors propose M-ORE, the first modality-disentangled online recursive editing framework. M-ORE maintains local statistical information for textual and visual modules separately within fixed orthogonal low-rank subspaces and integrates proximal projection optimization with Sherman-Morrison-based matrix recursion. This design ensures constant per-edit computational overhead while effectively suppressing interference. Experiments demonstrate that M-ORE significantly enhances editing reliability, generalization, and locality across multiple multimodal large models and online editing benchmarks, exhibiting superior quality-efficiency scalability.
📝 Abstract
Online model editing for multimodal large language models (MLLMs) requires assimilating a stream of corrections under tight compute and memory budgets. Yet editors developed for text-only LLMs often degrade on MLLMs: visually dominant activations skew the statistics that shape updates, causing cross-modal conflict, while sequential writes become entangled in a shared edit space and amplify long-horizon interference, causing inter-edit interference. To address these, we propose M-ORE, a modality-decoupled online recursive editor for lifelong MLLM adaptation. M-ORE is derived from a unified proximal-projection formulation and admits a closed-form update with a Sherman-Morrison recursion, yielding constant per-edit overhead. It maintains module-wise locality statistics for the text stack and the visual projector to avoid visually dominated update shaping and performs continual updates in a fixed orthogonal low-rank edit subspace via a Sherman-Morrison recursion to mitigate long-horizon interference. Experiments on multiple MLLM backbones and online editing benchmarks show that our M-ORE method consistently improves reliability, generality, and locality over strong baselines, while achieving favorable quality-efficiency scaling. Our code is publicly available at https://github.com/lab-klc/M-ORE.
Problem

Research questions and friction points this paper is trying to address.

online model editing
multimodal large language models
cross-modal conflict
long-horizon interference
inter-edit interference
Innovation

Methods, ideas, or system contributions that make the work stand out.

modality-decoupled
online recursive editing
multimodal large language models
Sherman-Morrison recursion
low-rank edit subspace
🔎 Similar Papers
2024-05-06Conference on Empirical Methods in Natural Language ProcessingCitations: 9