Beyond Believability: Accurate Human Behavior Simulation with Fine-Tuned LLMs

📅 2025-03-26

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This work addresses the limitation of large language models (LLMs) in web action generation—prioritizing subjective plausibility over objective behavioral accuracy—by introducing a behavior modeling paradigm grounded in real-world online shopping interaction data. Methodologically, it (1) establishes the first quantitative benchmark for evaluating web interaction behaviors, and (2) proposes a dual-path training framework combining real-data fine-tuning with synthetic reasoning trajectory augmentation, integrating behavioral sequence modeling and explicit stepwise reasoning injection. Experiments on DeepSeek-R1, Llama, and Claude demonstrate that, compared to prompt-engineering-only baselines, the approach achieves significant gains in action prediction accuracy on real-world action datasets; moreover, explicit reasoning substantially improves behavioral fidelity. This work provides a quantifiable, reproducible methodology for high-fidelity human behavior simulation and LLM-based agent development.

Technology Category

Application Category

📝 Abstract

Recent research shows that LLMs can simulate ``believable'' human behaviors to power LLM agents via prompt-only methods. In this work, we focus on evaluating and improving LLM's objective ``accuracy'' rather than the subjective ``believability'' in the web action generation task, leveraging a large-scale, real-world dataset collected from online shopping human actions. We present the first comprehensive quantitative evaluation of state-of-the-art LLMs (e.g., DeepSeek-R1, Llama, and Claude) on the task of web action generation. Our results show that fine-tuning LLMs on real-world behavioral data substantially improves their ability to generate actions compared to prompt-only methods. Furthermore, incorporating synthesized reasoning traces into model training leads to additional performance gains, demonstrating the value of explicit rationale in behavior modeling. This work establishes a new benchmark for evaluating LLMs in behavior simulation and offers actionable insights into how real-world action data and reasoning augmentation can enhance the fidelity of LLM agents.

Problem

Research questions and friction points this paper is trying to address.

Evaluating and improving LLM accuracy in web action generation

Fine-tuning LLMs with real-world data enhances action generation

Incorporating reasoning traces boosts performance in behavior modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning LLMs on real-world behavioral data

Incorporating synthesized reasoning traces into training

Establishing a benchmark for LLM behavior simulation

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Researcher, Alignment Training

OpenAI

$250K – $445K • Offers Equity

San Francisco

Authors to Follow