Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models

📅 2024-10-02

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

247K/year

🤖 AI Summary

Existing RLHF methods rely on explicit human feedback, limiting their ability to model fine-grained preferences accurately. This paper proposes GazeReward, the first framework to systematically integrate eye-tracking data—as implicit behavioral feedback—into reward modeling, uncovering learnable associations between cognitive signals (e.g., fixation duration, regressive saccade paths) and textual preferences. Our approach unifies eye-movement feature engineering, multimodal reward modeling, PPO-based reinforcement learning, and GazeGAN—a generative model for synthetic eye-movement synthesis—and enables end-to-end alignment fine-tuning of large language models (LLMs). Evaluated on multiple preference benchmarks—including HH-RLHF and PKU-SafeRLHF—the reward model achieves average accuracy gains of 4.2–7.8%, with显著 improvements in robustness and cross-task generalization. GazeReward establishes a novel paradigm for implicit-feedback-driven LLM alignment.

Technology Category

Application Category

📝 Abstract

Advancements in Natural Language Processing (NLP), have led to the emergence of Large Language Models (LLMs) such as GPT, Llama, Claude, and Gemini, which excel across a range of tasks but require extensive fine-tuning to align their outputs with human expectations. A widely used method for achieving this alignment is Reinforcement Learning from Human Feedback (RLHF), which, despite its success, faces challenges in accurately modelling human preferences. In this paper, we introduce GazeReward, a novel framework that integrates implicit feedback -- and specifically eye-tracking (ET) data -- into the Reward Model (RM). In addition, we explore how ET-based features can provide insights into user preferences. Through ablation studies we test our framework with different integration methods, LLMs, and ET generator models, demonstrating that our approach significantly improves the accuracy of the RM on established human preference datasets. This work advances the ongoing discussion on optimizing AI alignment with human values, exploring the potential of cognitive data for shaping future NLP research.

Problem

Research questions and friction points this paper is trying to address.

Aligning LLM outputs with human expectations using gaze-based feedback.

Improving Reward Model accuracy via eye-tracking data integration.

Exploring cognitive data to enhance AI-human value alignment.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates eye-tracking data into Reward Model

Uses gaze-based feedback for human preference insights

Improves accuracy of human preference datasets

🔎 Similar Papers

FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback