🤖 AI Summary
To address the challenge that network operators cannot access end-user Quality of Experience (QoE) metrics for encrypted video traffic, this paper proposes an application-agnostic, client-side non-cooperative end-to-end QoE estimation method. We introduce the first objective QoE modeling framework specifically designed for IMVCA/VCAs (e.g., WhatsApp), integrating network-layer traffic feature extraction with a dual-task learning model that jointly optimizes no-reference PIQE-based perceptual quality assessment and dynamic FPS prediction. Evaluated on a real-world 25,680-second encrypted WhatsApp traffic dataset, our method achieves 85.2% accuracy in FPS prediction (error ≤ 2 fps) and 90.2% accuracy in PIQE-based quality classification. By eliminating reliance on application-layer metrics or client instrumentation, the approach delivers a deployable, generalizable QoE monitoring capability for operators—enabling scalable, privacy-preserving network-wide QoE awareness.
📝 Abstract
Instant Messaging-Based Video Call Applications (IMVCAs) and Video Conferencing Applications (VCAs) have become integral to modern communication. Ensuring a high Quality of Experience (QoE) for users in this context is critical for network operators, as network conditions significantly impact user QoE. However, network operators lack access to end-device QoE metrics due to encrypted traffic. Existing solutions estimate QoE metrics from encrypted traffic traversing the network, with the most advanced approaches leveraging machine learning models. Subsequently, the need for ground truth QoE metrics for training and validation poses a challenge, as not all video applications provide these metrics. To address this challenge, we propose an application-agnostic approach for objective QoE estimation from encrypted traffic. Independent of the video application, we obtained key video QoE metrics, enabling broad applicability to various proprietary IMVCAs and VCAs. To validate our solution, we created a diverse dataset from WhatsApp video sessions under various network conditions, comprising 25,680 seconds of traffic data and QoE metrics. Our evaluation shows high performance across the entire dataset, with 85.2% accuracy for FPS predictions within an error margin of two FPS, and 90.2% accuracy for PIQE-based quality rating classification.