🤖 AI Summary
This work addresses the limited global planning capability of compact large language models in long-horizon, multi-step reasoning tasks, which often leads to error accumulation. To overcome this limitation, the authors propose the PILOT framework, which employs a lightweight hypernetwork to generate query-conditioned implicit guidance vectors. These vectors internalize the strategic planning ability of a large teacher model as intrinsic guidance signals for the compact student model, enabling non-intrusive optimization without modifying the backbone weights. By integrating implicit guidance, conditional vector synthesis, and hypernetwork mechanisms, PILOT achieves substantial performance gains over strong baselines—e.g., +8.9% on MATH500—while incurring negligible inference latency, thus effectively balancing accuracy and efficiency on mathematical and programming benchmarks.
📝 Abstract
Strategic planning is critical for multi-step reasoning, yet compact Large Language Models (LLMs) often lack the capacity to formulate global strategies, leading to error propagation in long-horizon tasks. Our analysis reveals that LLMs possess latent reasoning capabilities that can be unlocked when conditioned on explicit plans from a teacher model; however, runtime reliance on external guidance is often impractical due to latency and availability constraints. To bridge this gap, we propose PILOT (Planning via Internalized Latent Optimization Trajectories), a non-invasive framework designed to internalize the strategic oversight of large models into intrinsic Latent Guidance. Instead of altering backbone weights, PILOT employs a lightweight Hyper-Network to synthesize a query-conditioned Latent Guidance vector. This vector acts as an internal steering mechanism, guiding the model's representations toward optimal reasoning paths. Extensive experiments on mathematical and coding benchmarks demonstrate that PILOT effectively stabilizes reasoning trajectories, consistently outperforming strong baselines (e.g., +8.9% on MATH500) with negligible inference latency.