C2T: A Classifier-Based Tree Construction Method in Speculative Decoding

📅 2025-02-19

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address inefficiencies in token tree construction and high verification overhead in speculative decoding for large language models (LLMs), this paper proposes a dynamic token tree generation and pruning method based on a lightweight multi-feature classifier. Moving beyond joint probability, the approach incorporates diverse semantic and structural features to build a confidence-driven, adaptive tree structure, enabling real-time pruning of unreliable branches during decoding. This yields significant improvements in candidate token prediction accuracy and verification efficiency: total candidate tokens are reduced by 25% while maintaining or even increasing acceptance length. Experimental results demonstrate that, compared to state-of-the-art methods such as EAGLE-2, the proposed method substantially reduces inference latency and computational cost, establishing a novel paradigm for efficient LLM inference.

Technology Category

Application Category

📝 Abstract

The growing scale of Large Language Models (LLMs) has exacerbated inference latency and computational costs. Speculative decoding methods, which aim to mitigate these issues, often face inefficiencies in the construction of token trees and the verification of candidate tokens. Existing strategies, including chain mode, static tree, and dynamic tree approaches, have limitations in accurately preparing candidate token trees for verification. We propose a novel method named C2T that adopts a lightweight classifier to generate and prune token trees dynamically. Our classifier considers additional feature variables beyond the commonly used joint probability to predict the confidence score for each draft token to determine whether it is the candidate token for verification. This method outperforms state-of-the-art (SOTA) methods such as EAGLE-2 on multiple benchmarks, by reducing the total number of candidate tokens by 25% while maintaining or even improving the acceptance length.

Problem

Research questions and friction points this paper is trying to address.

Reduces LLM inference latency

Improves token tree construction

Optimizes candidate token verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight classifier for tree generation

Dynamic pruning of token trees

Enhanced feature variables for confidence prediction

🔎 Similar Papers

No similar papers found.

Authors to Follow