🤖 AI Summary
Existing foundation model evaluation methods—including human-centered paradigms—overemphasize output correctness while neglecting the dynamic coupling between response quality and user perception during interaction, thus failing to uncover the underlying mechanisms of user experience (UX).
Method: We propose QoNext—the first framework to adapt the Quality of Experience (QoE) paradigm from networking and multimedia domains to large language model evaluation. Through controlled human-perception experiments, we systematically collect fine-grained subjective ratings across diverse configurations, establishing the first QoE-oriented benchmark dataset; we then train a high-accuracy UX prediction model using measurable system parameters.
Contribution/Results: QoNext transcends traditional static-output evaluation by introducing a quantitative UX assessment framework that jointly models interaction dynamics and response quality. It enables proactive, interpretable, and optimization-aware evaluation of foundation model QoE, providing empirically grounded, deployment-ready tuning guidance for model development and productization.
📝 Abstract
Existing evaluations of foundation models, including recent human-centric approaches, fail to capture what truly matters: user's experience during interaction. Current methods treat evaluation as a matter of output correctness alone, overlooking that user satisfaction emerges from the interplay between response quality and interaction, which limits their ability to account for the mechanisms underlying user experience. To address this gap, we introduce QoNext, the first framework that adapts Quality of Experience (QoE) principles from networking and multimedia to the assessment of foundation models. QoNext identifies experiential factors that shape user experience and incorporates them into controlled experiments, where human ratings are collected under varied configurations. From these studies we construct a QoE-oriented database and train predictive models that estimate perceived user experience from measurable system parameters. Our results demonstrate that QoNext not only enables proactive and fine-grained evaluation but also provides actionable guidance for productized services of optimizing foundation models in practice.