VORTEX: Aligning Task Utility and Human Preferences through LLM-Guided Reward Shaping

📅 2025-09-19

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In social influence optimization, existing AI systems struggle to flexibly accommodate dynamically expressed human preferences in natural language while preserving task utility. This paper proposes a natural-language-guided reward shaping framework that seamlessly integrates human preferences into multi-objective optimization: a large language model parses textual feedback to generate differentiable, dynamic reward signals, and a text-gradient prompting mechanism enables parameter-free iterative reinforcement learning. We theoretically prove convergence to the Pareto-optimal solution set. Crucially, the method requires no modification to the underlying solver nor pre-specified trade-off weights. Evaluated on real-world allocation tasks, it significantly improves human-alignment coverage (+23.6%) while retaining ≥98.4% of original task performance—outperforming state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract

In social impact optimization, AI decision systems often rely on solvers that optimize well-calibrated mathematical objectives. However, these solvers cannot directly accommodate evolving human preferences, typically expressed in natural language rather than formal constraints. Recent approaches address this by using large language models (LLMs) to generate new reward functions from preference descriptions. While flexible, they risk sacrificing the system's core utility guarantees. In this paper, we propose exttt{VORTEX}, a language-guided reward shaping framework that preserves established optimization goals while adaptively incorporating human feedback. By formalizing the problem as multi-objective optimization, we use LLMs to iteratively generate shaping rewards based on verbal reinforcement and text-gradient prompt updates. This allows stakeholders to steer decision behavior via natural language without modifying solvers or specifying trade-off weights. We provide theoretical guarantees that exttt{VORTEX} converges to Pareto-optimal trade-offs between utility and preference satisfaction. Empirical results in real-world allocation tasks demonstrate that exttt{VORTEX} outperforms baselines in satisfying human-aligned coverage goals while maintaining high task performance. This work introduces a practical and theoretically grounded paradigm for human-AI collaborative optimization guided by natural language.

Problem

Research questions and friction points this paper is trying to address.

Aligning AI decision systems with evolving human preferences expressed in natural language

Preserving core utility guarantees while incorporating human feedback through reward shaping

Enabling stakeholders to steer decision behavior without modifying solvers or specifying weights

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-guided reward shaping for multi-objective optimization

Iterative text-gradient updates from verbal reinforcement

Pareto-optimal trade-offs between utility and preferences

🔎 Similar Papers

No similar papers found.

Authors to Follow