Confidence-Weighted Token Set Cover for Early Hypothesis Pruning in Self-Consistency

📅 2025-08-05

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

To address the low token efficiency of self-consistency methods in chain-of-thought reasoning, this paper proposes ConfCov: an early-hypothesis pruning framework that preserves parallelism. ConfCov jointly models the model’s internal confidence estimates and term-level coverage relationships among intermediate hypotheses, enabling dynamic identification and removal of redundant hypotheses via a lightweight weighted set cover algorithm. Crucially, it performs intermediate-layer pruning in a single forward pass while maintaining parallel generation of multiple reasoning paths—eliminating sequential backtracking. Experiments across three mathematical reasoning benchmarks with five large language models demonstrate that ConfCov reduces token consumption by 23.7% on average (ranging from 10% to 35%), significantly accelerates inference, and preserves accuracy. To our knowledge, this is the first work to jointly leverage confidence signals and lexical coverage for self-consistency pruning, achieving a favorable trade-off among efficiency, solution quality, and scalability.

Technology Category

Application Category

📝 Abstract

Despite its simplicity and efficacy, the high token expenditure of self-consistency can limit its practical utility. Here we investigate if self-consistency can be made more token-efficient for long chain-of-thought reasoning tasks, while preserving its parallelism, through early hypothesis pruning. Concretely, we generate all solutions in parallel, but periodically prune intermediate hypotheses that are deemed unnecessary based on two lightweight indicators: (a) the model's own confidence in individual hypotheses, and (b) lexical coverage of all current hypotheses by candidate subsets that are under consideration for continued retention. We design a fast weighted set cover algorithm that utilizes the two indicators; our evaluation of five LLMs on three math benchmarks shows that this method can improve token efficiency for all models, by 10-35% in many cases.

Problem

Research questions and friction points this paper is trying to address.

Reducing token expenditure in self-consistency for efficiency

Pruning unnecessary hypotheses early in reasoning tasks

Improving token efficiency using confidence and coverage metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Early hypothesis pruning for efficiency

Confidence-weighted token set cover

Lightweight indicators guide pruning

🔎 Similar Papers

No similar papers found.