Concept Training for Human-Aligned Language Models

📅 2026-03-30

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This work addresses the limitation of conventional next-token prediction objectives, which treat semantically similar yet lexically distinct valid continuations as mutually exclusive, thereby misaligning with human semantic understanding. To remedy this, the paper proposes replacing individual tokens with “concepts”—sets of semantically related words—as prediction targets. It introduces concept-level supervision for the first time, constructing concept sets via semantic clustering and jointly optimizing them alongside the standard language modeling loss. This approach encourages the model to prioritize semantic consistency over surface-form fidelity. Empirical results demonstrate improved performance on multiple lexical semantic similarity benchmarks, significantly reduced perplexity on semantically salient words, and only a marginal increase in overall token-level perplexity while remaining competitive. Collectively, these findings indicate enhanced alignment between the language model’s predictions and human semantic judgments.

Technology Category

Application Category

📝 Abstract

The next-token prediction (NTP) objective trains language models to predict a single continuation token at each step. In natural language, however, a prefix can be continued in many valid ways, and even similar meanings may differ in surface form. For example, the sentence ``this website is safe to \underline{browse}'' could plausibly continue with words such as browse, search, visit, surf, or navigate. While standard NTP training treats these alternatives as mutually exclusive targets, we explore a framework that instead predicts concepts, approximated as sets of semantically related tokens. We show that models trained with concept supervision exhibit stronger alignment with human semantic similarity judgments on multiple lexical benchmarks. These gains are accompanied by lower perplexity on semantically meaningful words (definition in Section 3.1), and a modest increase in global token-level perplexity, reflecting a tradeoff between standard NTP optimization and concept-level supervision. Our results suggest that concept-level objectives can improve semantic alignment while maintaining competitive language modeling performance.

Problem

Research questions and friction points this paper is trying to address.

next-token prediction

semantic alignment

language models

concept training

lexical semantics

Innovation

Methods, ideas, or system contributions that make the work stand out.

concept training

next-token prediction

semantic alignment