Timber: Training-free Instruct Model Refining with Base via Effective Rank

📅 2025-09-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Post-training instruction tuning—commonly viewed as superficial adaptation—enhances task exploitation but severely degrades exploration capability. This paper is the first to uncover this fundamental limitation through the lens of effective rank (eRank) of model weights. We propose a gradient-free, weight-delta-based retroactive optimization method: leveraging eRank analysis to identify redundant and critical parameter subspaces, and selectively fusing high-rank exploratory structures from the base model. This enables synergistic enhancement of both exploitation and exploration. Our method incurs zero computational cost for deployment on Llama and Qwen instruction-tuned models, and yields significant improvements in Pass@k performance on mathematical reasoning and code generation benchmarks. These results demonstrate that weight-structure analysis provides an essential, paradigm-level advancement for instruction tuning—moving beyond surface-level adaptation toward principled, architecture-aware alignment.

Technology Category

Application Category

📝 Abstract
Post-training, which elicits a pretrained Base model into the corresponding Instruct model, is widely considered to be superficial. In this work, we first reinforce this hypothesis by providing novel quantitative evidence from the weight level that the effective rank (eRank) remains negligibly changed. However, this superficiality also suffers a critical trade-off, improving the exploitation capabilities at the cost of limiting its exploration. To tackle this issue, we propose Timber, a simple yet effective training-free method that enhances the exploration capability of the Instruct model while preserving its exploitation. The key insight is to partially revert Instruct towards the paired Base model by subtle yet targeted refinement of the weight deltas. Extensive experiments on Llama and Qwen series demonstrate that Timber consistently improves vanilla Instruct models, particularly on Pass@k performance. Our findings offer new insights into the post-training stage at the weight level and practical strategies to refine the Instruct model without training.
Problem

Research questions and friction points this paper is trying to address.

Enhancing exploration capability of Instruct models without training
Addressing trade-off between exploitation and exploration in post-training
Refining Instruct model weights by partially reverting to Base model
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free method refines Instruct model weights
Partial reversion to Base model enhances exploration
Targeted refinement of weight deltas preserves exploitation
🔎 Similar Papers
No similar papers found.