Learn-to-learn on Arbitrary Textual Conditioning: A Hypernetwork-Driven Meta-Gated LLM

📅 2026-05-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

218K/year
🤖 AI Summary
Traditional large language models exhibit limited generalization under corpus heterogeneity and fine-grained conditional shifts, while fine-tuning often incurs catastrophic forgetting, and existing meta-learning approaches struggle to scale to large models. This work proposes a novel meta-control paradigm that introduces a learnable β parameter within the SwiGLU module as a meta-signal, enabling an adaptive meta-gating mechanism to modulate the nonlinearity of the feedforward network. A hypernetwork is further designed to dynamically generate β parameters conditioned on arbitrary textual attributes—such as task, domain, persona, or style—thereby unifying multidimensional textual conditions into a single, generalizable control framework. The method achieves state-of-the-art performance across both seen and unseen conditions, significantly outperforming standard fine-tuning and existing meta-learning baselines.
📝 Abstract
Conventional LLMs may suffer from corpus heterogeneity and subtle condition changes. While finetuning can create the catastrophe forgetting issue, application of meta-learning on LLMs is also limited due to its complexity and scalability. In this paper, we activate the meta-signal of $β$ within the SwiGLU blocks, resulting in a meta-gating mechanism that adaptively adjusts the nonlinearity of FFN. A hypernetwork is employed which dynamically produces $β$ on textual conditions, providing meta-controllability on LLMs. By testing on different condition types such as task, domain, persona, and style, our method outperforms finetuning and meta-learning baselines, and can generalize reasonably on unseen tasks, condition types, or instructions. Our code can be found in https://github.com/AaronJi/MeGan.
Problem

Research questions and friction points this paper is trying to address.

corpus heterogeneity
catastrophic forgetting
meta-learning
textual conditioning
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

meta-gating
hypernetwork
textual conditioning
SwiGLU
large language models
🔎 Similar Papers
No similar papers found.