🤖 AI Summary
This work addresses the challenge that large language models often struggle to dynamically align with user intent during multi-turn interactions due to the entanglement of semantic ambiguity and model capability limitations. The authors propose a joint adaptive mechanism that co-optimizes prompts (words) and parameters (weights) by framing the interaction as a unified optimization problem, enabling simultaneous adjustment of contextual semantics and model parameters during inference. Theoretical analysis reveals that semantic clarity is a prerequisite for efficient parameter updates, substantially reducing the required parameter adjustments. By refining ambiguous intents via textual gradients and bridging capability gaps through test-time parameter updates, the method achieves joint optimization in heterogeneous spaces. Evaluated on the MATH dataset, it outperforms the current state-of-the-art by 30% in performance while reducing interaction rounds by 40%, significantly enhancing both accuracy and efficiency.
📝 Abstract
Test-time policy adaptation for multi-turn interactions (T2PAM) is essential for aligning Large Language Models (LLMs) with dynamic user needs during inference time. However, existing paradigms commonly treat test-time adaptation as a single-axis problem, either purely refining instructions (Prompt Engineering) or only adjusting weights (Test-Time Training), ignoring that interaction failures stem from a coupled mix of ambiguity and incapacity. We argue that these two optimization paths are not merely additive but synergistic: semantic clarity acts as a pre-conditioner for effective parameter updates. To this end, we propose ROSA2, a framework that reformulates interaction as a joint optimization problem over the heterogeneous space of Words and Weights. By mathematically decomposing the error signal, ROSA2 utilizes textual gradients to rectify intent ambiguity and parameter updates to bridge capability gaps. Theoretically, we prove that this co-adaptation strictly reduces the required parameter shift for convergence. Empirically, ROSA2 outperforms state-of-the-art baselines by 30% on MATH while reducing interaction turns by 40%, demonstrating that refining the context unlocks the true potential of parameter updates.