Sustainable techniques to improve Data Quality for training image-based explanatory models for Recommender Systems

📅 2024-07-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Image-driven recommender systems suffer from low-quality explanations due to sparse training data and label noise, while conventional data augmentation exacerbates carbon footprint. Method: This paper proposes a low-carbon, efficient data quality enhancement paradigm that innovatively integrates positive-unlabeled (PU) learning to identify high-confidence negative samples, geometric/semantic transformations for augmentation, and CLIP-guided text-to-image generation. A multi-model ensemble evaluation framework is further introduced for automated quality verification. Contribution/Results: Evaluated on multiple real-world restaurant recommendation explanation datasets, the method improves average ranking metrics of three state-of-the-art explainable models by 5%. Crucially, it achieves zero additional carbon emissions and no computational overhead increase—marking the first approach to jointly optimize data quality improvement and sustainable model training.

Technology Category

Application Category

📝 Abstract
Visual explanations based on user-uploaded images are an effective and self-contained approach to provide transparency to Recommender Systems (RS), but intrinsic limitations of data used in this explainability paradigm cause existing approaches to use bad quality training data that is highly sparse and suffers from labelling noise. Popular training enrichment approaches like model enlargement or massive data gathering are expensive and environmentally unsustainable, thus we seek to provide better visual explanations to RS aligning with the principles of Responsible AI. In this work, we research the intersection of effective and sustainable training enrichment strategies for visual-based RS explainability models by developing three novel strategies that focus on training Data Quality: 1) selection of reliable negative training examples using Positive-unlabelled Learning, 2) transform-based data augmentation, and 3) text-to-image generative-based data augmentation. The integration of these strategies in three state-of-the-art explainability models increases 5% the performance in relevant ranking metrics of these visual-based RS explainability models without penalizing their practical long-term sustainability, as tested in multiple real-world restaurant recommendation explanation datasets.
Problem

Research questions and friction points this paper is trying to address.

Improving data quality for image-based recommender system explainability models
Addressing sparse and noisy training data in visual explanations
Developing sustainable enrichment strategies without environmental cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

Positive-unlabelled Learning for reliable negatives
Transform-based data augmentation technique
Text-to-image generative data augmentation
🔎 Similar Papers
No similar papers found.
Jorge Paz-Ruza
Jorge Paz-Ruza
Interim Professor, Universidade da Coruña
Frugal Machine LearningResponsible AIGreen AI
D
David Esteban-Mart'inez
A
A. Alonso-Betanzos
LIDIA Group, CITIC, Universidade da Coruña, 15071, Spain
B
B. Guijarro-Berdiñas
LIDIA Group, CITIC, Universidade da Coruña, 15071, Spain