How to Train Your Advisor: Steering Black-Box LLMs with Advisor Models

📅 2025-10-02

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Static prompting for black-box large language models (LLMs) lacks dynamic adaptability, struggling to handle diverse inputs, user preferences, and environmental shifts. Method: We propose Advisor Models—a lightweight, learnable proxy that dynamically generates natural language instructions via reinforcement learning to guide black-box LLM behavior in real time, without modifying the LLM’s frozen weights. Contribution/Results: This work introduces the first trainable proxy as a dynamic interface for black-box systems, enabling cross-model transferability, user-specific personalization, and out-of-distribution robustness. Integrated with contextual natural language feedback, Advisor Models significantly outperform static prompt optimization on reasoning and personalization tasks, achieving higher task accuracy and adaptive capability under varying environmental conditions.

Technology Category

Application Category

📝 Abstract

Foundation models are increasingly deployed as black-box services, where model weights cannot be modified and customization is limited to prompting. While static prompt optimization has shown promise, it produces a single fixed prompt that fails to adapt to different inputs, users, or environments. We introduce Advisor Models, lightweight parametric policies trained with reinforcement learning to reactively issue natural language steering instructions in-context to black-box models. The advisor is a second small model that sits between the input and the model, shaping behavior on a per-instance basis using reward signals from the environment. Across multiple domains involving reasoning and personalization, we show that Advisor Models outperform static prompt optimizers, discovering environment dynamics and improving downstream task performance. We also demonstrate the generalizability of advisors by transferring them across black-box models, as well as the framework's ability to achieve specialization while retaining robustness to out-of-distribution inputs. Viewed more broadly, Advisor Models provide a learnable interface to black-box systems where the advisor acts as a parametric, environment-specific memory. We argue that dynamic optimization of black-box models via Advisor Models is a promising direction for enabling personalization and environment-adaptable AI with frontier-level capabilities.

Problem

Research questions and friction points this paper is trying to address.

Optimizing black-box LLMs through reactive natural language steering

Adapting model behavior per instance using lightweight advisor policies

Enabling dynamic personalization without modifying underlying model weights

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight policies trained with reinforcement learning

Issuing natural language steering instructions in-context

Achieving specialization while retaining robustness to distribution shifts

🔎 Similar Papers

No similar papers found.