The impact of fine tuning in LLaMA on hallucinations for named entity extraction in legal documentation

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This study addresses hallucination in large language models (LLMs) for named entity recognition (NER) of critical legal entities—such as disability percentage and compensation amount—in traffic-accident-related judicial documents. We propose a two-stage approach: (1) semantic retrieval using MiniLM-L12-v2 or text-embedding-ada-002 to locate relevant document passages; and (2) precise entity extraction via prompt engineering and LoRA fine-tuning. Our work is the first to systematically demonstrate that LoRA fine-tuning significantly mitigates hallucination in LLaMA-family models. We find that the LLaMA-3 8B base model achieves 76.6% accuracy—nearly matching the fine-tuned LLaMA-2 70B (79.4%, +17.7% over its base version), highlighting architectural advances. GPT-4 Turbo attains the best performance at 86.1%. All LLM-based methods substantially outperform rule-based matching (39.5%).

Technology Category

Application Category

📝 Abstract

The extraction of information about traffic accidents from legal documents is crucial for quantifying insurance company costs. Extracting entities such as percentages of physical and/or psychological disability and the involved compensation amounts is a challenging process, even for experts, due to the subtle arguments and reasoning in the court decision. A two-step procedure is proposed: first, segmenting the document identifying the most relevant segments, and then extracting the entities. For text segmentation, two methodologies are compared: a classic method based on regular expressions and a second approach that divides the document into blocks of n-tokens, which are then vectorized using multilingual models for semantic searches (text-embedding-ada-002/MiniLM-L12-v2 ). Subsequently, large language models (LLaMA-2 7b, 70b, LLaMA-3 8b, and GPT-4 Turbo) are applied with prompting to the selected segments for entity extraction. For the LLaMA models, fine-tuning is performed using LoRA. LLaMA-2 7b, even with zero temperature, shows a significant number of hallucinations in extractions which are an important contention point for named entity extraction. This work shows that these hallucinations are substantially reduced after finetuning the model. The performance of the methodology based on segment vectorization and subsequent use of LLMs significantly surpasses the classic method which achieves an accuracy of 39.5%. Among open-source models, LLaMA-2 70B with finetuning achieves the highest accuracy 79.4%, surpassing its base version 61.7%. Notably, the base LLaMA-3 8B model already performs comparably to the finetuned LLaMA-2 70B model, achieving 76.6%, highlighting the rapid progress in model development. Meanwhile, GPT-4 Turbo achieves the highest accuracy at 86.1%.

Problem

Research questions and friction points this paper is trying to address.

Reducing hallucinations in named entity extraction from legal documents

Improving accuracy of traffic accident information extraction for insurance

Comparing segmentation methods for entity extraction in legal texts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-step document segmentation and entity extraction

Fine-tuning LLaMA models using LoRA technique

Vectorization with multilingual models for semantic search

🔎 Similar Papers

No similar papers found.