Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism

📅 2025-10-29

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Conventional specialist-generalist models (SGMs) struggle to simultaneously achieve strong general-purpose capabilities and domain-specific expert performance, primarily due to the absence of task-guided dedicated memory mechanisms in current large language model (LLM) architectures. Method: We propose a Task-Aware Memory (TAM) mechanism comprising a memory Trigger and Updater, enabling fine-tuning-free online parameter adaptation and dynamic contextual memory management. TAM employs a linear-complexity architecture, self-supervised sample modeling, lightweight encoder-decoder modules, and a frozen-backbone training strategy. Contribution/Results: TAM significantly outperforms baselines across multiple NLP benchmarks. It further surpasses both state-of-the-art LLMs and domain-specific models on MRI image reconstruction and clinical report generation—demonstrating, for the first time, seamless evolution of a general-purpose language model into a high-accuracy medical expert model.

Technology Category

Application Category

📝 Abstract

Specialized Generalist Models (SGMs) aim to preserve broad capabilities while achieving expert-level performance in target domains. However, traditional LLM structures including Transformer, Linear Attention, and hybrid models do not employ specialized memory mechanism guided by task information. In this paper, we present Nirvana, an SGM with specialized memory mechanism, linear time complexity, and test-time task information extraction. Besides, we propose the Task-Aware Memory Trigger ($ extit{Trigger}$) that flexibly adjusts memory mechanism based on the current task's requirements. In Trigger, each incoming sample is treated as a self-supervised fine-tuning task, enabling Nirvana to adapt its task-related parameters on the fly to domain shifts. We also design the Specialized Memory Updater ($ extit{Updater}$) that dynamically memorizes the context guided by Trigger. We conduct experiments on both general language tasks and specialized medical tasks. On a variety of natural language modeling benchmarks, Nirvana achieves competitive or superior results compared to the existing LLM structures. To prove the effectiveness of Trigger on specialized tasks, we test Nirvana's performance on a challenging medical task, i.e., Magnetic Resonance Imaging (MRI). We post-train frozen Nirvana backbone with lightweight codecs on paired electromagnetic signals and MRI images. Despite the frozen Nirvana backbone, Trigger guides the model to adapt to the MRI domain with the change of task-related parameters. Nirvana achieves higher-quality MRI reconstruction compared to conventional MRI models as well as the models with traditional LLMs' backbone, and can also generate accurate preliminary clinical reports accordingly.

Problem

Research questions and friction points this paper is trying to address.

Developing specialized generalist models with task-aware memory mechanisms

Achieving expert performance while maintaining broad model capabilities

Adapting to domain shifts through dynamic memory parameter adjustment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Task-aware memory mechanism for specialized generalist models

Linear time complexity with test-time task extraction

Dynamic memory adaptation using self-supervised fine-tuning triggers

🔎 Similar Papers

Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons