TermGPT: Multi-Level Contrastive Fine-Tuning for Terminology Adaptation in Legal and Financial Domain

📅 2025-11-13

📈 Citations: 0

✨ Influential: 0

career value

143K/year

🤖 AI Summary

Large language models (LLMs) suffer from isotropic term representations in domain-specific fields such as law and finance, impairing their ability to discriminate subtle semantic distinctions and limiting downstream task performance. To address this, we propose TermGPT, a multi-level contrastive fine-tuning framework that jointly applies contrastive learning at both the sentence and token levels. TermGPT constructs semantically consistent yet highly discriminative positive/negative pairs via sentence-level graph modeling and introduces a context-aware sampling strategy that integrates topological and semantic cues for fine-grained representation optimization. Furthermore, we curate the first financial terminology evaluation dataset derived from regulatory documents. Experiments demonstrate that TermGPT significantly outperforms existing methods on term discrimination tasks and substantially improves performance on downstream applications—including legal judgment prediction and financial risk analysis—by enhancing the fidelity and discriminability of domain-specific term representations.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have demonstrated impressive performance in text generation tasks; however, their embedding spaces often suffer from the isotropy problem, resulting in poor discrimination of domain-specific terminology, particularly in legal and financial contexts. This weakness in terminology-level representation can severely hinder downstream tasks such as legal judgment prediction or financial risk analysis, where subtle semantic distinctions are critical. To address this problem, we propose TermGPT, a multi-level contrastive fine-tuning framework designed for terminology adaptation. We first construct a sentence graph to capture semantic and structural relations, and generate semantically consistent yet discriminative positive and negative samples based on contextual and topological cues. We then devise a multi-level contrastive learning approach at both the sentence and token levels, enhancing global contextual understanding and fine-grained terminology discrimination. To support robust evaluation, we construct the first financial terminology dataset derived from official regulatory documents. Experiments show that TermGPT outperforms existing baselines in term discrimination tasks within the finance and legal domains.

Problem

Research questions and friction points this paper is trying to address.

Addressing poor domain-specific terminology discrimination in legal and financial LLMs

Enhancing fine-grained semantic distinction for legal judgment and financial risk tasks

Solving isotropy problems in embedding spaces through multi-level contrastive learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-level contrastive fine-tuning for terminology adaptation

Sentence graph construction capturing semantic structural relations

Contextual topological cues generating discriminative training samples

🔎 Similar Papers

No similar papers found.