Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement

📅 2025-12-05

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Enterprises increasingly adopt large language models (LLMs) to automate routine tasks due to their low development barrier; however, LLMs incur significantly higher computational resource consumption and energy costs compared to lightweight, task-specialized models. Method: This paper proposes Just-In-Time Replacement (JITR), the first framework enabling transparent, dynamic “downgrading” of LLMs. JITR detects high-frequency repetitive task patterns, then automatically searches for suitable compact models, applies transfer learning, and performs lightweight fine-tuning to deploy efficient substitutes in real time. Contribution/Results: JITR preserves LLM-level usability while enabling self-evolving model-serving systems. Evaluated on the Poodle prototype, JITR reduces inference energy consumption and operational cost by 62% on average across representative tasks, with negligible performance degradation (<1.5% accuracy loss).

Technology Category

Application Category

📝 Abstract

Businesses increasingly rely on large language models (LLMs) to automate simple repetitive tasks instead of developing custom machine learning models. LLMs require few, if any, training examples and can be utilized by users without expertise in model development. However, this comes at the cost of substantially higher resource and energy consumption compared to smaller models, which often achieve similar predictive performance for simple tasks. In this paper, we present our vision for just-in-time model replacement (JITR), where, upon identifying a recurring task in calls to an LLM, the model is replaced transparently with a cheaper alternative that performs well for this specific task. JITR retains the ease of use and low development effort of LLMs, while saving significant cost and energy. We discuss the main challenges in realizing our vision regarding the identification of recurring tasks and the creation of a custom model. Specifically, we argue that model search and transfer learning will play a crucial role in JITR to efficiently identify and fine-tune models for a recurring task. Using our JITR prototype Poodle, we achieve significant savings for exemplary tasks.

Problem

Research questions and friction points this paper is trying to address.

Reducing resource consumption of large language models

Replacing LLMs with cheaper alternatives for recurring tasks

Maintaining ease of use while cutting costs and energy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Just-in-time model replacement for cost efficiency

Transparently swapping LLMs with cheaper task-specific models

Using model search and transfer learning for adaptation

🔎 Similar Papers

No similar papers found.