DFPE: A Diverse Fingerprint Ensemble for Enhancing LLM Performance

📅 2025-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the uneven performance and poor cross-disciplinary generalization of large language models (LLMs) across multi-domain tasks, this paper proposes Diversity-Fingerprint-based Ensemble (DFPE). DFPE introduces a novel response-fingerprint-driven mechanism to quantify and preserve model diversity, integrating subject-level K-means clustering, quantile-thresholded dynamic filtering, and accuracy-aware adaptive weighted fusion for fine-grained, robust multi-model ensembling. On the MMLU benchmark, DFPE achieves a 3% absolute improvement in overall accuracy and a 5% gain in subject-level average accuracy, significantly enhancing cross-domain generalization and robustness. Its core contributions are: (1) response-fingerprint-based diversity modeling; (2) subject-granular dynamic filtering; and (3) accuracy-aware weighted ensemble paradigm.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have shown remarkable capabilities across various natural language processing tasks but often struggle to excel uniformly in diverse or complex domains. We propose a novel ensemble method - Diverse Fingerprint Ensemble (DFPE), which leverages the complementary strengths of multiple LLMs to achieve more robust performance. Our approach involves: (1) clustering models based on response"fingerprints"patterns, (2) applying a quantile-based filtering mechanism to remove underperforming models at a per-subject level, and (3) assigning adaptive weights to remaining models based on their subject-wise validation accuracy. In experiments on the Massive Multitask Language Understanding (MMLU) benchmark, DFPE outperforms the best single model by 3% overall accuracy and 5% in discipline-level accuracy. This method increases the robustness and generalization of LLMs and underscores how model selection, diversity preservation, and performance-driven weighting can effectively address challenging, multi-faceted language understanding tasks.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Complex Linguistic Tasks
Performance Degradation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diversified Fingerprint Ensemble
Model Aggregation
Weighted Performance Enhancement
🔎 Similar Papers
No similar papers found.