Efficient Multitask Learning in Small Language Models Through Upside-Down Reinforcement Learning

📅 2025-02-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Small language models (SLMs) underperform in multi-task prompt generation under resource-constrained settings. Method: We propose “inverted reinforcement learning”—a novel paradigm integrating Llama-3–generated synthetic data distillation with lightweight policy fine-tuning to enhance instruction understanding and cross-task generalization of a 100M-parameter GPT-2. Departing from conventional supervised fine-tuning, our approach optimizes end-to-end prompt generation directly for task relevance. Results: Our method achieves near-state-of-the-art performance—matching Llama-3, Qwen2, and Mistral within <5% in relevance scoring—while reducing model size by 80× and inference latency by 90%. It enables real-time edge deployment and establishes a scalable pathway for efficiently adapting SLMs to complex NLP tasks.

Technology Category

Application Category

📝 Abstract
In this work, we demonstrate that small language models (SLMs), specifically a 100M parameter GPT-2 model, can achieve competitive performance in multitask prompt generation tasks while requiring only a fraction of the computational resources needed by large language models (LLMs). Through a novel combination of upside-down reinforcement learning and synthetic data distillation from a powerful LLM, Llama-3, we train an SLM that achieves relevance scores within 5% of state-of-the-art models, including Llama-3, Qwen2, and Mistral, despite being up to 80 times smaller, making it highly suitable for resource-constrained and real-time applications. This study highlights the potential of SLMs as efficient multitask learners in multimodal settings, providing a promising alternative to LLMs for scalable, low-latency deployments.
Problem

Research questions and friction points this paper is trying to address.

Enhancing small language models' multitask performance
Reducing computational resources for language models
Applying upside-down reinforcement learning in SLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Upside-down reinforcement learning
Synthetic data distillation
Small language model efficiency
🔎 Similar Papers
No similar papers found.