Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning

📅 2025-07-10

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

To address gradient conflicts and negative transfer arising from divergent task objectives in Transformer-based multi-task learning (MTL), this paper proposes the Dynamic Token Modulation and Expansion framework for MTL (DTME-MTL). DTME-MTL is the first method to explicitly detect inter-task gradient conflicts in token space. It enables parameter-efficient, collaborative optimization across tasks within a shared encoder—without replicating parameters or architectural components—via lightweight, task-adaptive token modulation and sparse expansion mechanisms. Crucially, it requires no modification to the backbone architecture and incurs negligible computational overhead. Extensive experiments on standard MTL benchmarks—including GLUE and MTL-Benchmark—demonstrate consistent and significant improvements in overall performance and per-task robustness. These results validate DTME-MTL’s effectiveness, generalizability, and scalability in diverse multi-task settings.

Technology Category

Application Category

📝 Abstract

Multi-Task Learning (MTL) enables multiple tasks to be learned within a shared network, but differences in objectives across tasks can cause negative transfer, where the learning of one task degrades another task's performance. While pre-trained transformers significantly improve MTL performance, their fixed network capacity and rigid structure limit adaptability. Previous dynamic network architectures attempt to address this but are inefficient as they directly convert shared parameters into task-specific ones. We propose Dynamic Token Modulation and Expansion (DTME-MTL), a framework applicable to any transformer-based MTL architecture. DTME-MTL enhances adaptability and reduces overfitting by identifying gradient conflicts in token space and applying adaptive solutions based on conflict type. Unlike prior methods that mitigate negative transfer by duplicating network parameters, DTME-MTL operates entirely in token space, enabling efficient adaptation without excessive parameter growth. Extensive experiments demonstrate that DTME-MTL consistently improves multi-task performance with minimal computational overhead, offering a scalable and effective solution for enhancing transformer-based MTL models.

Problem

Research questions and friction points this paper is trying to address.

Resolving gradient conflicts in token space for MTL

Improving transformer adaptability without parameter duplication

Enhancing multi-task performance with minimal computational overhead

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Token Modulation for gradient conflict resolution

Token space adaptation without parameter duplication

Scalable transformer-based multi-task learning enhancement

🔎 Similar Papers

Can Optimization Trajectories Explain Multi-Task Transfer?