Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM alignment methods overlook the human “satisficing” decision-making principle—requiring strict satisfaction of secondary objectives (e.g., harmlessness) up to a predefined acceptability threshold while optimizing primary objectives (e.g., helpfulness). Method: We propose SITAlign, the first framework to formally integrate satisficing from behavioral economics into LLM alignment. It introduces a constraint-aware, multi-objective alignment approach operating at inference time, combining constrained optimization decoding, multi-objective reward modeling, and threshold-driven sampling. Contribution/Results: We derive a theoretical suboptimality bound for our method. On PKU-SafeRLHF, with harmlessness as a hard constraint and helpfulness as the primary objective, SITAlign improves GPT-4’s win rate by 22.3% over baselines while guaranteeing 100% compliance with the prescribed harmlessness threshold—significantly outperforming unconstrained or soft-constraint alternatives.

Technology Category

Application Category

📝 Abstract
Aligning large language models with humans is challenging due to the inherently multifaceted nature of preference feedback. While existing approaches typically frame this as a multi-objective optimization problem, they often overlook how humans actually make decisions. Research on bounded rationality suggests that human decision making follows satisficing strategies-optimizing primary objectives while ensuring others meet acceptable thresholds. To bridge this gap and operationalize the notion of satisficing alignment, we propose SITAlign: an inference time framework that addresses the multifaceted nature of alignment by maximizing a primary objective while satisfying threshold-based constraints on secondary criteria. We provide theoretical insights by deriving sub-optimality bounds of our satisficing based inference alignment approach. We empirically validate SITAlign's performance through extensive experimentation on multiple benchmarks. For instance, on the PKU-SafeRLHF dataset with the primary objective of maximizing helpfulness while ensuring a threshold on harmlessness, SITAlign outperforms the state-of-the-art multi objective decoding strategy by a margin of 22.3% in terms of GPT-4 win-tie rate for helpfulness reward while adhering to the threshold on harmlessness.
Problem

Research questions and friction points this paper is trying to address.

Aligning LLMs with multifaceted human preferences
Optimizing primary objectives while meeting secondary thresholds
Improving inference-time alignment via satisficing strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Satisficing alignment for LLM inference
Threshold-based constraints on secondary criteria
Maximizing primary objective with sub-optimality bounds