🤖 AI Summary
This paper addresses the challenge of characterizing category-level performance upper bounds for classification models. We propose a Pareto-optimal framework for performance enhancement. Methodologically, we introduce the category-level influence function and influence vector—novel metrics that quantify the heterogeneous impact of individual training samples on predictions across classes—and formulate sample reweighting as a linear program to jointly improve all class accuracies without degrading any. Our core contributions are: (1) modeling performance limits at the category level; (2) replacing global accuracy optimization with Pareto-improvement criteria; and (3) providing an interpretable, computationally tractable mechanism for multi-class collaborative optimization. Experiments on synthetic data and standard vision and text benchmarks demonstrate that our method significantly approaches the category-level performance ceiling, delivering comprehensive and equitable gains across all classes.
📝 Abstract
Data-centric learning seeks to improve model performance from the perspective of data quality, and has been drawing increasing attention in the machine learning community. Among its key tools, influence functions provide a powerful framework to quantify the impact of individual training samples on model predictions, enabling practitioners to identify detrimental samples and retrain models on a cleaner dataset for improved performance. However, most existing work focuses on the question: "what data benefits the learning model?" In this paper, we take a step further and investigate a more fundamental question: "what is the performance ceiling of the learning model?" Unlike prior studies that primarily measure improvement through overall accuracy, we emphasize category-wise accuracy and aim for Pareto improvements, ensuring that every class benefits, rather than allowing tradeoffs where some classes improve at the expense of others. To address this challenge, we propose category-wise influence functions and introduce an influence vector that quantifies the impact of each training sample across all categories. Leveraging these influence vectors, we develop a principled criterion to determine whether a model can still be improved, and further design a linear programming-based sample reweighting framework to achieve Pareto performance improvements. Through extensive experiments on synthetic datasets, vision, and text benchmarks, we demonstrate the effectiveness of our approach in estimating and achieving a model's performance improvement across multiple categories of interest.