Harmonizing Diverse Models: A Layer-wise Merging Strategy for Consistent Generation

📅 2025-10-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address inconsistent responses from large language models (LLMs) to semantically equivalent queries in Retrieval-Augmented Generation (RAG) systems, this paper proposes a layer-wise consistency-aware model fusion method. Without requiring extensive human annotation, the approach constructs synthetic triplets and introduces a triplet loss function that explicitly enforces consistency among intermediate-layer activations. It further designs a dynamic weight fusion mechanism guided by inter-layer activation similarity to synergistically integrate knowledge from multiple specialized models. The key innovation lies in explicitly modeling hidden-layer activation consistency as the primary fusion criterion, thereby jointly enhancing semantic robustness and generation stability. Experimental results demonstrate that the fused model improves response similarity—measured via embedding-based cosine similarity—by 47.5% over strong baselines, significantly boosting output consistency and reliability in industrial-scale RAG deployments.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) systems leverage Large Language Models (LLMs) to generate accurate and reliable responses that are grounded in retrieved context. However, LLMs often generate inconsistent outputs for semantically equivalent inputs, a problem compounded by the scarcity of consistency-focused training data and the limitations of current fine-tuning techniques in enhancing output consistency. We propose a new approach combining systematic synthetic data generation, triplet loss for better embeddings, and a novel layer-wise model merging approach. Using consistency-aware weights derived from intermediate layer activations, our method effectively integrates knowledge from specialized models. Experimental results how that our merged model significantly enhances output consistency, achieving a ~47.5% improvement in response similarity over the baseline, thus offering a practical solution for increasing the reliability of an industrial RAG system.
Problem

Research questions and friction points this paper is trying to address.

Addresses inconsistent LLM outputs for equivalent semantic inputs
Solves scarcity of consistency-focused training data limitations
Improves reliability of industrial RAG system generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Layer-wise model merging with consistency-aware weights
Systematic synthetic data generation for training
Triplet loss optimization for improved embeddings
🔎 Similar Papers
No similar papers found.
X
Xujun Peng
AI Foundations, Capital One, McLean, VA, USA
A
Anoop Kumar
AI Foundations, Capital One, McLean, VA, USA
J
Jingyu Wu
AI Foundations, Capital One, McLean, VA, USA
Parker Glenn
Parker Glenn
Capital One
Natural Language ProcessingComputational LinguisticsConversational AI
Daben Liu
Daben Liu
Capital One
Generative AINLPAutomatic Speech Recognition