STARS: Segment-level Token Alignment with Rejection Sampling in Large Language Models

📅 2025-11-05

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

To address the challenge of efficiently aligning large language models (LLMs) with human values during inference, this paper proposes a decoding-time alignment algorithm. It partitions each generated output into fixed-length token segments, performs fine-grained scoring and rejection sampling per segment, and dynamically accepts or resamples segments based on real-time reward model feedback—enabling early error correction and online regulation. Unlike costly supervised fine-tuning (SFT) or computationally intractable full-sequence ranking, our approach achieves high alignment quality without parameter updates or exhaustive search. Experiments across six mainstream LLMs show that our method improves over SFT by up to 14.9 percentage points, outperforms DPO by 4.3 points, and matches the performance of the strong baseline Best-of-N. The core contribution lies in a novel, reward-guided, segment-wise decoding-time alignment mechanism—establishing a lightweight, real-time, and high-fidelity paradigm for value alignment.

Technology Category

Application Category

📝 Abstract

Aligning large language models with human values is crucial for their safe deployment; however, existing methods, such as fine-tuning, are computationally expensive and suboptimal. In contrast, inference-time approaches like Best-of-N sampling require practically infeasible computation to achieve optimal alignment. We propose STARS: Segment-level Token Alignment with Rejection Sampling, a decoding-time algorithm that steers model generation by iteratively sampling, scoring, and rejecting/accepting short, fixed-size token segments. This allows for early correction of the generation path, significantly improving computational efficiency and boosting alignment quality. Across a suite of six LLMs, we show that STARS outperforms Supervised Fine-Tuning (SFT) by up to 14.9 percentage points and Direct Preference Optimization (DPO) by up to 4.3 percentage points on win-rates, while remaining highly competitive with strong Best-of-N baselines. Our work establishes granular, reward-guided sampling as a generalizable, robust, and efficient alternative to traditional fine-tuning and full-sequence ranking methods for aligning LLMs.

Problem

Research questions and friction points this paper is trying to address.

Aligning large language models with human values efficiently

Reducing computational costs of inference-time alignment methods

Improving alignment quality over fine-tuning and preference optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Segment-level token alignment with rejection sampling

Iteratively samples, scores, and rejects token segments

Granular reward-guided sampling for efficient LLM alignment

🔎 Similar Papers

No similar papers found.