TokenSeek: Memory Efficient Fine Tuning via Instance-Aware Token Ditching

πŸ“… 2026-01-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the high activation memory cost of large language model fine-tuning, a challenge inadequately mitigated by existing data-agnostic optimization methods that suffer from inefficiency and instability. To this end, the authors propose TokenSeekβ€”a general, plug-and-play, instance-aware token selection mechanism that dynamically identifies and discards redundant tokens during training. By introducing instance-aware strategies to token-level activation compression for the first time, TokenSeek achieves substantial memory reduction while maintaining or even improving model performance. The approach is both interpretable and architecture-agnostic, demonstrating strong practical utility. Experimental results show that on Llama3.2-1B, TokenSeek attains comparable or superior performance using only 14.8% of the memory required by standard fine-tuning.

Technology Category

Application Category

πŸ“ Abstract
Fine tuning has been regarded as a de facto approach for adapting large language models (LLMs) to downstream tasks, but the high training memory consumption inherited from LLMs makes this process inefficient. Among existing memory efficient approaches, activation-related optimization has proven particularly effective, as activations consistently dominate overall memory consumption. Although prior arts offer various activation optimization strategies, their data-agnostic nature ultimately results in ineffective and unstable fine tuning. In this paper, we propose TokenSeek, a universal plugin solution for various transformer-based models through instance-aware token seeking and ditching, achieving significant fine-tuning memory savings (e.g., requiring only 14.8% of the memory on Llama3.2 1B) with on-par or even better performance. Furthermore, our interpretable token seeking process reveals the underlying reasons for its effectiveness, offering valuable insights for future research on token efficiency. Homepage: https://runjia.tech/iclr_tokenseek/
Problem

Research questions and friction points this paper is trying to address.

fine-tuning
memory efficiency
large language models
activation optimization
token efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

TokenSeek
memory-efficient fine-tuning
instance-aware token selection
activation compression
transformer optimization
πŸ”Ž Similar Papers
No similar papers found.