Composable Interventions for Language Models

📅 2024-07-09

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

165K/year

🤖 AI Summary

Existing test-time interventions—such as knowledge editing, model compression, and machine unlearning—have evolved in isolation, with no standardized framework to systematically study their interactions when applied jointly on the same model. Method: We propose the first composable intervention framework, unifying the modeling of synergistic mechanisms across these three intervention types. We design a comprehensive evaluation suite for intervention composability, introducing novel metrics including the *composability score*, and implement a modular PyTorch-based system with cross-category pipeline scheduling. Contribution/Results: Our analysis of 310 intervention combinations reveals critical interaction patterns: strong order dependence, suppression of editing and unlearning efficacy by compression, and the failure of conventional single-intervention metrics in compositional settings. All code is fully open-sourced, establishing a foundation for multi-objective, cooperative intervention paradigms in foundation models.

Technology Category

Application Category

📝 Abstract

Test-time interventions for language models can enhance factual accuracy, mitigate harmful outputs, and improve model efficiency without costly retraining. But despite a flood of new methods, different types of interventions are largely developing independently. In practice, multiple interventions must be applied sequentially to the same model, yet we lack standardized ways to study how interventions interact. We fill this gap by introducing composable interventions, a framework to study the effects of using multiple interventions on the same language models, featuring new metrics and a unified codebase. Using our framework, we conduct extensive experiments and compose popular methods from three emerging intervention categories -- Knowledge Editing, Model Compression, and Machine Unlearning. Our results from 310 different compositions uncover meaningful interactions: compression hinders editing and unlearning, composing interventions hinges on their order of application, and popular general-purpose metrics are inadequate for assessing composability. Taken together, our findings showcase clear gaps in composability, suggesting a need for new multi-objective interventions. All of our code is public: https://github.com/hartvigsen-group/composable-interventions.

Problem

Research questions and friction points this paper is trying to address.

Study interactions of multiple interventions on language models

Develop framework for composable interventions with new metrics

Identify gaps in composability and need for multi-objective interventions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces composable interventions framework for language models

Develops new metrics and unified codebase for intervention studies

Explores interactions among Knowledge Editing, Compression, Unlearning

🔎 Similar Papers

The Remarkable Robustness of LLMs: Stages of Inference?