BoostTransformer: Enhancing Transformer Models with Subgrid Selection and Importance Sampling

📅 2025-08-04

📈 Citations: 0

✨ Influential: 0

career value

126K/year

🤖 AI Summary

To address the high computational cost and complex hyperparameter tuning of Transformer models, this paper proposes an efficient training framework integrating boosting mechanisms. The method introduces (1) a least-squares boosting objective—replacing standard cross-entropy—to concentrate gradient updates on hard-to-classify samples; (2) a sub-grid token selection strategy that dynamically identifies information-dense local token subsets; and (3) importance-weighted sampling to suppress redundant computation. These components are jointly embedded into the Transformer training pipeline. Empirical evaluation across multiple fine-grained text classification benchmarks demonstrates that the approach accelerates convergence and improves generalization: it reduces training time by 32%–47% while boosting accuracy by 1.8–3.4 percentage points on average. Moreover, it significantly lowers architecture search overhead.

Technology Category

Application Category

📝 Abstract

Transformer architectures dominate modern NLP but often demand heavy computational resources and intricate hyperparameter tuning. To mitigate these challenges, we propose a novel framework, BoostTransformer, that augments transformers with boosting principles through subgrid token selection and importance-weighted sampling. Our method incorporates a least square boosting objective directly into the transformer pipeline, enabling more efficient training and improved performance. Across multiple fine-grained text classification benchmarks, BoostTransformer demonstrates both faster convergence and higher accuracy, surpassing standard transformers while minimizing architectural search overhead.

Problem

Research questions and friction points this paper is trying to address.

Reduce computational demands of transformer models

Simplify hyperparameter tuning in transformers

Improve training efficiency and model accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Subgrid token selection for efficiency

Importance-weighted sampling optimization

Least square boosting objective integration

🔎 Similar Papers

No similar papers found.