Luwen Technical Report

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the challenges large language models face in the legal domain—namely, dense specialized terminology, complex reasoning requirements, and rapidly evolving knowledge. To tackle these issues, the authors propose Luwen, an open-source language model tailored for Chinese legal applications, built upon the Baichuan foundation. Luwen undergoes multi-stage adaptation through continual pre-training on legal corpora, instruction fine-tuning, and retrieval-augmented generation (RAG) enhanced with a legal knowledge base. This integrated approach effectively combines domain-specific knowledge with general linguistic capabilities. Comprehensive evaluations across five tasks—including judgment prediction, bar exam question answering, legal summarization, statutory QA, and judicial reasoning—demonstrate that Luwen significantly outperforms strong baseline models, confirming its effectiveness and generalization capacity in specialized legal scenarios.

Technology Category

Application Category

📝 Abstract

Large language models have demonstrated remarkable capabilities across a wide range of natural language processing tasks, yet their application in the legal domain remains challenging due to the specialized terminology, complex reasoning requirements, and rapidly evolving legal knowledge involved. In this paper, we present Luwen, an open-source Chinese legal language model built upon the Baichuan foundation model through three key techniques: continual pre-training on a large-scale legal corpus, supervised fine-tuning with carefully curated legal instruction data, and retrieval-augmented generation integrated with a comprehensive legal knowledge base. We evaluate Luwen on five representative legal tasks spanning both prediction and generation settings, including legal judgment prediction, judicial examination, legal text summarization, law article question answering, and judicial decision reasoning. Experimental results show that Luwen outperforms several strong baselines, demonstrating the effectiveness of our approach in adapting general-purpose language models to the legal domain.

Problem

Research questions and friction points this paper is trying to address.

legal domain

specialized terminology

complex reasoning

evolving legal knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

continual pre-training

supervised fine-tuning

retrieval-augmented generation

legal language model