SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models

📅 2025-10-19

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses the underexplored problem of editing abstract auditory attribute knowledge in Large Audio–Language Models (LALMs). Existing knowledge editing methods focus predominantly on text or vision modalities and neglect audio-specific characteristics. To bridge this gap, we propose the first auditory-attribute-oriented knowledge editing framework and introduce SAKE—the first dedicated benchmark for evaluating such editing. SAKE targets updates of abstract auditory attributes and defines a four-dimensional evaluation protocol: fidelity, generalization, transferability, and audio–text locality. We systematically evaluate seven state-of-the-art editing methods across two LALMs, revealing critical deficiencies in maintaining attribute consistency, enabling cross-modal reasoning generalization, and ensuring stability under sequential edits. This work establishes the first foundation for auditory knowledge editing, providing a benchmark, methodology, and clear identification of key challenges for extending multimodal knowledge editing into the audio domain.

Technology Category

Application Category

📝 Abstract

Knowledge editing offers an efficient way to update model knowledge without full retraining, but prior work has concentrated almost exclusively on textual or visual modalities. We introduce SAKE, the first benchmark specifically designed for editing auditory attribute knowledge in Large Audio-Language Models (LALMs). Unlike factual updates, SAKE targets several abstract auditory attributes, capturing knowledge types that go beyond conventional textual and visual domains. We benchmark seven editing methods on two LALMs along four dimensions: reliability, generality, audio/text locality, and portability. Results highlight challenges such as preserving intra-attribute knowledge unrelated to the edit, generalizing edits to multimodal reasoning, and maintaining edits under sequential updates. SAKE provides a principled framework to study how knowledge editing extends to the auditory modalities, opening new directions for maintaining and adapting LALMs in more diverse real-world scenarios.

Problem

Research questions and friction points this paper is trying to address.

Editing auditory knowledge in audio-language models beyond text/vision

Benchmarking knowledge editing methods across four key dimensions

Addressing challenges in generalization and preservation during edits

Innovation

Methods, ideas, or system contributions that make the work stand out.

First benchmark for editing auditory attribute knowledge

Evaluates seven editing methods across four dimensions

Focuses on abstract auditory attributes beyond text

🔎 Similar Papers

No similar papers found.

Anthropic

$350,000—$500,000 USD

San Francisco, CA, USA

Machine Learning Engineer - Speech & Multimodal Language Modeling

Apple

Cupertino, United States of America

Research Scientist Intern, Multimodal AI (PhD)