DFPE: A Diverse Fingerprint Ensemble for Enhancing LLM Performance

📅 2025-01-29

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

To address the uneven performance and poor cross-disciplinary generalization of large language models (LLMs) across multi-domain tasks, this paper proposes Diversity-Fingerprint-based Ensemble (DFPE). DFPE introduces a novel response-fingerprint-driven mechanism to quantify and preserve model diversity, integrating subject-level K-means clustering, quantile-thresholded dynamic filtering, and accuracy-aware adaptive weighted fusion for fine-grained, robust multi-model ensembling. On the MMLU benchmark, DFPE achieves a 3% absolute improvement in overall accuracy and a 5% gain in subject-level average accuracy, significantly enhancing cross-domain generalization and robustness. Its core contributions are: (1) response-fingerprint-based diversity modeling; (2) subject-granular dynamic filtering; and (3) accuracy-aware weighted ensemble paradigm.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have shown remarkable capabilities across various natural language processing tasks but often struggle to excel uniformly in diverse or complex domains. We propose a novel ensemble method - Diverse Fingerprint Ensemble (DFPE), which leverages the complementary strengths of multiple LLMs to achieve more robust performance. Our approach involves: (1) clustering models based on response"fingerprints"patterns, (2) applying a quantile-based filtering mechanism to remove underperforming models at a per-subject level, and (3) assigning adaptive weights to remaining models based on their subject-wise validation accuracy. In experiments on the Massive Multitask Language Understanding (MMLU) benchmark, DFPE outperforms the best single model by 3% overall accuracy and 5% in discipline-level accuracy. This method increases the robustness and generalization of LLMs and underscores how model selection, diversity preservation, and performance-driven weighting can effectively address challenging, multi-faceted language understanding tasks.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Complex Linguistic Tasks

Performance Degradation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diversified Fingerprint Ensemble

Model Aggregation

Weighted Performance Enhancement

🔎 Similar Papers

Can Watermarked LLMs be Identified by Users via Crafted Prompts?