Mechanisms are Transferable: Data-Efficient Low-Resource Adaptation via Circuit-Targeted Supervised Fine-Tuning

📅 2026-01-13

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work addresses key challenges in adapting models to low-resource languages, including scarce labeled data, instability of full-model fine-tuning, and catastrophic forgetting in cross-lingual continual learning. The authors propose a circuit-oriented fine-tuning mechanism that operates without counterfactual interventions. By leveraging a label-balanced mean baseline and task-direction correlation scores, the method identifies sparse attention heads critical to the target task and applies gradient masking exclusively to these heads and their associated LayerNorm parameters for selective fine-tuning. Evaluated on NusaX-Senti and XNLI, this approach achieves substantial cross-lingual performance gains with minimal parameter updates and effectively mitigates catastrophic forgetting, outperforming full-model fine-tuning.

Technology Category

Application Category

📝 Abstract

Adapting LLMs to low-resource languages is difficult: labeled data is scarce, full-model fine-tuning is unstable, and continued cross-lingual tuning can cause catastrophic forgetting. We propose Circuit-Targeted Supervised Fine-Tuning (CT-SFT): a counterfactual-free adaptation of CD-T (Contextual Decomposition Transformer) that uses a label-balanced mean baseline and task-directional relevance scoring to identify a sparse set of task-relevant attention heads in a proxy-language checkpoint, then transfer learns to a target language by updating only those heads (plus LayerNorm) via head-level gradient masking. Across NusaX-Senti and XNLI, CT-SFT improves cross-lingual accuracy over continued full fine-tuning while updating only a small subset of model parameters. We find an editing-preserving trade-off: harder transfers favor editing circuit heads, while easier transfers often favor near-zero (i.e., low-relevance heads) updates, preserving the source mechanism. CT-SFT also substantially reduces catastrophic forgetting, preserving proxy/source-language competence during transfer.

Problem

Research questions and friction points this paper is trying to address.

low-resource adaptation

catastrophic forgetting

cross-lingual transfer

data efficiency

language model adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Circuit-Targeted SFT

parameter-efficient adaptation

cross-lingual transfer