🤖 AI Summary
Small language models (SLMs) face inherent limitations in knowledge-intensive tasks due to constrained parameter capacity and rigid, fixed inference paradigms. To address this, we propose the Dynamic Task Vector Machine (DTVM) framework—the first approach to explicitly model the internal <think> reasoning process as a learnable mechanism for generating structured task representations, enabling SLMs to autonomously construct and optimize task vectors. Our method integrates reinforcement learning with verification feedback (RLVR), multi-candidate prompting (MCP), and agent-based web search, empowering a 1.7B-parameter model to perform complex open-domain question answering. On the SimpleQA benchmark, DTVM achieves 78.3% accuracy—significantly outperforming same-scale SLMs and matching the performance of large models such as DeepSeek-V3. This work demonstrates that explicitly modeling and optimizing the semantic structure of reasoning processes enables SLMs to achieve substantial gains in knowledge reasoning capability without increasing model size.
📝 Abstract
Small language models (SLMs) are inherently limited in knowledge-intensive tasks due to their constrained capacity. While test-time computation offers a path to enhanced performance, most approaches treat reasoning as a fixed or heuristic process. In this work, we propose a new paradigm: viewing the model's internal reasoning, delimited by <think> and </think> tags, as a dynamic task vector machine. Rather than treating the content inside these tags as a mere trace of thought, we interpret the generation process itself as a mechanism through which the model extbf{constructs and refines its own task vectors} on the fly. We developed a method to optimize this dynamic task vector machine through RLVR and successfully trained an agentic web-search model. We present Lucy, a 1.7B-parameter SLM that leverages this dynamic reasoning mechanism with MCP integration to achieve 78.3% accuracy on the SimpleQA benchmark, performing on par with much larger models such as DeepSeek-V3. This demonstrates that small models can rival large ones when equipped with structured, self-constructed task reasoning.