🤖 AI Summary
Black-box large language models (LLMs) exhibit limited controllability and lack parameter access, hindering their performance on long-horizon, complex tasks such as reasoning, planning, and personalization.
Method: We propose Matryoshka—a novel controller-generator framework wherein a lightweight white-box LLM acts as a controller to orchestrate multi-step prompting, guide staged generation by a black-box LLM, and enable self-optimization via intermediate feedback. The method integrates reinforcement learning–based policy training, iterative prompt engineering, and preference-aligned modeling—achieving controllable multi-turn interaction without parameter fine-tuning.
Contribution/Results: Experiments demonstrate significant performance gains across three distinct task categories. Matryoshka is the first framework to empirically validate the effectiveness, robustness, and generalizability of purely prompt-driven, controllable long-horizon generation with black-box LLMs.
📝 Abstract
Despite the impressive generative abilities of black-box large language models (LLMs), their inherent opacity hinders further advancements in capabilities such as reasoning, planning, and personalization. Existing works aim to enhance LLM capabilities via domain-specific adaptation or in-context learning, which require additional training on accessible model parameters, an infeasible option for black-box LLMs. To address this challenge, we introduce Matryoshika, a lightweight white-box LLM controller that guides a large-scale black-box LLM generator by decomposing complex tasks into a series of intermediate outputs. Specifically, we consider the black-box LLM as an environment, with Matryoshika serving as a policy to provide intermediate guidance through prompts for driving the black-box LLM. Matryoshika is trained to pivot the outputs of the black-box LLM aligning with preferences during iterative interaction, which enables controllable multi-turn generation and self-improvement in optimizing intermediate guidance. Empirical evaluations on three diverse tasks demonstrate that Matryoshika effectively enhances the capabilities of black-box LLMs in complex, long-horizon tasks, including reasoning, planning, and personalization. By leveraging this pioneering controller-generator framework to mitigate dependence on model parameters, Matryoshika provides a transparent and practical solution for improving black-box LLMs through controllable multi-turn generation using white-box LLMs.