🤖 AI Summary
Small language models (SLMs) underperform in multi-task prompt generation under resource-constrained settings. Method: We propose “inverted reinforcement learning”—a novel paradigm integrating Llama-3–generated synthetic data distillation with lightweight policy fine-tuning to enhance instruction understanding and cross-task generalization of a 100M-parameter GPT-2. Departing from conventional supervised fine-tuning, our approach optimizes end-to-end prompt generation directly for task relevance. Results: Our method achieves near-state-of-the-art performance—matching Llama-3, Qwen2, and Mistral within <5% in relevance scoring—while reducing model size by 80× and inference latency by 90%. It enables real-time edge deployment and establishes a scalable pathway for efficiently adapting SLMs to complex NLP tasks.
📝 Abstract
In this work, we demonstrate that small language models (SLMs), specifically a 100M parameter GPT-2 model, can achieve competitive performance in multitask prompt generation tasks while requiring only a fraction of the computational resources needed by large language models (LLMs). Through a novel combination of upside-down reinforcement learning and synthetic data distillation from a powerful LLM, Llama-3, we train an SLM that achieves relevance scores within 5% of state-of-the-art models, including Llama-3, Qwen2, and Mistral, despite being up to 80 times smaller, making it highly suitable for resource-constrained and real-time applications. This study highlights the potential of SLMs as efficient multitask learners in multimodal settings, providing a promising alternative to LLMs for scalable, low-latency deployments.