🤖 AI Summary
Existing evaluations of large language models (LLMs) as social companions—particularly virtual pets—lack systematic, multidimensional assessment frameworks. Method: We propose Pet-Bench, the first benchmark explicitly designed for evaluating LLMs in emotionally grounded human–pet interaction. It introduces two novel dimensions—“self-evolution” and “developmental behavior”—moving beyond conventional role-playing paradigms. The benchmark encompasses memory-augmented dialogue, psychological interaction, intelligent scheduling, and cross-modal simulation, grounded in over 7,500 real-world human–pet interaction scenarios. Contribution/Results: Empirical evaluation across 28 mainstream LLMs reveals a nonlinear relationship between model scale and affective companionship capability. Pet-Bench establishes the first open-source, domain-specific evaluation suite for virtual pets, providing both a standardized benchmark and empirical foundations for targeted model customization and optimization in affective AI.
📝 Abstract
As interest in using Large Language Models (LLMs) for interactive and emotionally rich experiences grows, virtual pet companionship emerges as a novel yet underexplored application. Existing approaches focus on basic pet role-playing interactions without systematically benchmarking LLMs for comprehensive companionship. In this paper, we introduce Pet-Bench, a dedicated benchmark that evaluates LLMs across both self-interaction and human-interaction dimensions. Unlike prior work, Pet-Bench emphasizes self-evolution and developmental behaviors alongside interactive engagement, offering a more realistic reflection of pet companionship. It features diverse tasks such as intelligent scheduling, memory-based dialogues, and psychological conversations, with over 7,500 interaction instances designed to simulate complex pet behaviors. Evaluation of 28 LLMs reveals significant performance variations linked to model size and inherent capabilities, underscoring the need for specialized optimization in this domain. Pet-Bench serves as a foundational resource for benchmarking pet-related LLM abilities and advancing emotionally immersive human-pet interactions.