Beyond Simple Averaging: Improving NLP Ensemble Performance with Topological-Data-Analysis-Based Weighting

📅 2024-02-22
🏛️ International Conference on Data Science and Advanced Analytics
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Standard equal-weight averaging in NLP model ensembles overlooks individual performance disparities and structural differences among models. To address this, we propose a topology-aware dynamic weighting framework grounded in Topological Data Analysis (TDA). Specifically, we introduce persistent homology distance to quantify structural similarity among model prediction distributions and jointly optimize model weights by integrating both per-model accuracy and topological distance. Evaluated on multiple text classification benchmarks, our method achieves consistent improvements in overall accuracy (average +1.2%) and significantly enhances uncertainty calibration—reducing Expected Calibration Error (ECE) by 23%–38%. The core contribution lies in the first application of TDA to ensemble learning for NLP, enabling dual-objective weight optimization that is both performance-driven and structure-aware. This establishes a novel paradigm for trustworthy, interpretable NLP ensembles.

Technology Category

Application Category

📝 Abstract
In machine learning, ensembles are important tools for improving the model performance. In natural language processing specifically, ensembles boost the performance of a method due to multiple large models available in open source. However, existing approaches mostly rely on simple averaging of predictions by ensembles with equal weights for each model, ignoring differences in the quality and conformity of models. We propose to estimate weights for ensembles of NLP models using not only knowledge of their individual performance but also their similarity to each other. By adopting distance measures based on Topological Data Analysis (TDA), we improve our ensemble. The quality improves for both text classification accuracy and relevant uncertainty estimation.
Problem

Research questions and friction points this paper is trying to address.

Model Integration
Weighted Aggregation
Performance Optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Topological Data Analysis
Model Integration Scoring
Error Prediction Precision
🔎 Similar Papers
No similar papers found.