Help the machine to help you: an evaluation in the wild of egocentric data cleaning via skeptical learning

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

To address high noise in user-annotated data and the ecological invalidity of offline evaluation in personal digital assistants, this paper proposes and empirically validates SKEL, a skepticism-aware learning framework grounded in egocentric data. SKEL establishes, for the first time in real-world mobile settings, a closed-loop user feedback system: a four-week longitudinal study via the iLog app integrates active annotation, passive sensor logging, and dynamic skepticism modeling to enable real-time user correction of behavioral labels. Its core contribution lies in embedding user burden minimization into an iterative label purification pipeline—preserving fine-grained contextual awareness while continuously refining noisy labels. Experiments demonstrate that SKEL significantly reduces annotation effort (47% average reduction) and improves label accuracy (+19.3%), establishing a deployable paradigm for online evaluation and optimization in personalized data cleaning.

Technology Category

Application Category

📝 Abstract

Any digital personal assistant, whether used to support task performance, answer questions, or manage work and daily life, including fitness schedules, requires high-quality annotations to function properly. However, user annotations, whether actively produced or inferred from context (e.g., data from smartphone sensors), are often subject to errors and noise. Previous research on Skeptical Learning (SKEL) addressed the issue of noisy labels by comparing offline active annotations with passive data, allowing for an evaluation of annotation accuracy. However, this evaluation did not include confirmation from end-users, the best judges of their own context. In this study, we evaluate SKEL's performance in real-world conditions with actual users who can refine the input labels based on their current perspectives and needs. The study involves university students using the iLog mobile application on their devices over a period of four weeks. The results highlight the challenges of finding the right balance between user effort and data quality, as well as the potential benefits of using SKEL, which include reduced annotation effort and improved quality of collected data.

Problem

Research questions and friction points this paper is trying to address.

Evaluating Skeptical Learning with real user feedback

Addressing noisy annotations in personal assistant data

Balancing user effort and data quality improvement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Skeptical Learning evaluates noisy labels with user feedback

Real-world testing with mobile app users over four weeks

Balances annotation effort reduction with improved data quality

🔎 Similar Papers

SoK: Machine Learning for Misinformation Detection