Peeking Ahead of the Field Study: Exploring VLM Personas as Support Tools for Embodied Studies in HCI

📅 2026-02-17

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This study addresses the high cost, lengthy duration, and error-proneness of field studies in human–computer interaction (HCI), which often lack efficient, low-cost preliminary evaluation methods. To bridge this gap, the authors propose leveraging vision-language models (VLMs) to construct virtual personas that simulate human spatial perception, affective empathy, and behavioral responses in street-crossing tasks—marking the first application of VLM-based personas in embodied human–robot interaction research. Using video stimuli to drive behavioral simulation, the approach is quantitatively and qualitatively benchmarked against real human participant data. Results show that VLM personas effectively replicate key human response patterns, such as average crossing times (5.25 s vs. 5.07 s). Despite limitations in behavioral diversity and depth, the method demonstrates significant potential for formative studies, field preparation, and data augmentation, offering HCI a rapid-prototyping paradigm for pre-deployment evaluation.

Technology Category

Application Category

📝 Abstract

Field studies are irreplaceable but costly, time-consuming, and error-prone, which need careful preparation. Inspired by rapid-prototyping in manufacturing, we propose a fast, low-cost evaluation method using Vision-Language Model (VLM) personas to simulate outcomes comparable to field results. While LLMs show human-like reasoning and language capabilities, autonomous vehicle (AV)-pedestrian interaction requires spatial awareness, emotional empathy, and behavioral generation. This raises our research question: To what extent can VLM personas mimic human responses in field studies? We conducted parallel studies: 1) one real-world study with 20 participants, and 2) one video-study using 20 VLM personas, both on a street-crossing task. We compared their responses and interviewed five HCI researchers on potential applications. Results show that VLM personas mimic human response patterns (e.g., average crossing times of 5.25 s vs. 5.07 s) lack the behavioral variability and depth. They show promise for formative studies, field study preparation, and human data augmentation.

Problem

Research questions and friction points this paper is trying to address.

field study

Vision-Language Model

human behavior simulation

autonomous vehicle-pedestrian interaction

HCI

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language Model

Field Study Simulation

HCI