What Is The Performance Ceiling of My Classifier? Utilizing Category-Wise Influence Functions for Pareto Frontier Analysis

📅 2025-10-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper addresses the challenge of characterizing category-level performance upper bounds for classification models. We propose a Pareto-optimal framework for performance enhancement. Methodologically, we introduce the category-level influence function and influence vector—novel metrics that quantify the heterogeneous impact of individual training samples on predictions across classes—and formulate sample reweighting as a linear program to jointly improve all class accuracies without degrading any. Our core contributions are: (1) modeling performance limits at the category level; (2) replacing global accuracy optimization with Pareto-improvement criteria; and (3) providing an interpretable, computationally tractable mechanism for multi-class collaborative optimization. Experiments on synthetic data and standard vision and text benchmarks demonstrate that our method significantly approaches the category-level performance ceiling, delivering comprehensive and equitable gains across all classes.

Technology Category

Application Category

📝 Abstract

Data-centric learning seeks to improve model performance from the perspective of data quality, and has been drawing increasing attention in the machine learning community. Among its key tools, influence functions provide a powerful framework to quantify the impact of individual training samples on model predictions, enabling practitioners to identify detrimental samples and retrain models on a cleaner dataset for improved performance. However, most existing work focuses on the question: "what data benefits the learning model?" In this paper, we take a step further and investigate a more fundamental question: "what is the performance ceiling of the learning model?" Unlike prior studies that primarily measure improvement through overall accuracy, we emphasize category-wise accuracy and aim for Pareto improvements, ensuring that every class benefits, rather than allowing tradeoffs where some classes improve at the expense of others. To address this challenge, we propose category-wise influence functions and introduce an influence vector that quantifies the impact of each training sample across all categories. Leveraging these influence vectors, we develop a principled criterion to determine whether a model can still be improved, and further design a linear programming-based sample reweighting framework to achieve Pareto performance improvements. Through extensive experiments on synthetic datasets, vision, and text benchmarks, we demonstrate the effectiveness of our approach in estimating and achieving a model's performance improvement across multiple categories of interest.

Problem

Research questions and friction points this paper is trying to address.

Estimating classifier performance ceiling using category-wise influence functions

Achieving Pareto improvements across all categories without tradeoffs

Developing sample reweighting framework for multi-category performance enhancement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Category-wise influence functions quantify sample impacts

Linear programming framework achieves Pareto performance improvements

Influence vectors enable multi-class performance ceiling estimation

🔎 Similar Papers

No similar papers found.

Authors to Follow