Act-as-Pet: Benchmarking the Abilities of Large Language Models as E-Pets in Social Network Services

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing evaluations of large language models (LLMs) as social companions—particularly virtual pets—lack systematic, multidimensional assessment frameworks. Method: We propose Pet-Bench, the first benchmark explicitly designed for evaluating LLMs in emotionally grounded human–pet interaction. It introduces two novel dimensions—“self-evolution” and “developmental behavior”—moving beyond conventional role-playing paradigms. The benchmark encompasses memory-augmented dialogue, psychological interaction, intelligent scheduling, and cross-modal simulation, grounded in over 7,500 real-world human–pet interaction scenarios. Contribution/Results: Empirical evaluation across 28 mainstream LLMs reveals a nonlinear relationship between model scale and affective companionship capability. Pet-Bench establishes the first open-source, domain-specific evaluation suite for virtual pets, providing both a standardized benchmark and empirical foundations for targeted model customization and optimization in affective AI.

Technology Category

Application Category

📝 Abstract
As interest in using Large Language Models (LLMs) for interactive and emotionally rich experiences grows, virtual pet companionship emerges as a novel yet underexplored application. Existing approaches focus on basic pet role-playing interactions without systematically benchmarking LLMs for comprehensive companionship. In this paper, we introduce Pet-Bench, a dedicated benchmark that evaluates LLMs across both self-interaction and human-interaction dimensions. Unlike prior work, Pet-Bench emphasizes self-evolution and developmental behaviors alongside interactive engagement, offering a more realistic reflection of pet companionship. It features diverse tasks such as intelligent scheduling, memory-based dialogues, and psychological conversations, with over 7,500 interaction instances designed to simulate complex pet behaviors. Evaluation of 28 LLMs reveals significant performance variations linked to model size and inherent capabilities, underscoring the need for specialized optimization in this domain. Pet-Bench serves as a foundational resource for benchmarking pet-related LLM abilities and advancing emotionally immersive human-pet interactions.
Problem

Research questions and friction points this paper is trying to address.

Benchmarking LLMs for realistic virtual pet companionship
Evaluating self-evolution and interactive engagement in e-pets
Assessing performance variations among 28 LLMs in pet-related tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Pet-Bench for LLM pet companionship benchmarking
Evaluates self-evolution and interactive engagement behaviors
Tests 28 LLMs with 7,500+ complex interaction instances