An Industrial-Scale Insurance LLM Achieving Verifiable Domain Mastery and Hallucination Control without Competence Trade-offs

📅 2026-03-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of aligning large language models for high-stakes domains like insurance, where strict regulatory compliance, minimal hallucination, and strong general capabilities must coexist—a balance rarely achieved by existing approaches. The authors propose an end-to-end alignment paradigm that integrates verifiable data synthesis, dynamic data annealing, and a progressive SFT-RL curriculum framework combining RLVR and RLAIF to train INS-S1, a domain-specialized insurance model. Evaluated on INSEva—the most comprehensive insurance benchmark to date—INS-S1 achieves state-of-the-art performance, surpassing strong general-purpose models such as DeepSeek-R1 and Gemini-2.5-Pro while maintaining top-tier general abilities and reducing hallucination rates to just 0.6%.

Technology Category

Application Category

📝 Abstract
Adapting Large Language Models (LLMs) to high-stakes vertical domains like insurance presents a significant challenge: scenarios demand strict adherence to complex regulations and business logic with zero tolerance for hallucinations. Existing approaches often suffer from a Competency Trade-off - sacrificing general intelligence for domain expertise - or rely heavily on RAG without intrinsic reasoning. To bridge this gap, we present INS-S1, an insurance-specific LLM family trained via a novel end-to-end alignment paradigm. Our approach features two methodological innovations: (1) A Verifiable Data Synthesis System that constructs hierarchical datasets for actuarial reasoning and compliance; and (2) A Progressive SFT-RL Curriculum Framework that integrates dynamic data annealing with a synergistic mix of Verified Reasoning (RLVR) and AI Feedback (RLAIF). By optimizing data ratios and reward signals, this framework enforces domain constraints while preventing catastrophic forgetting. Additionally, we release INSEva, the most comprehensive insurance benchmark to date (39k+ samples). Extensive experiments show that INS-S1 achieves SOTA performance on domain tasks, significantly outperforming DeepSeek-R1 and Gemini-2.5-Pro. Crucially, it maintains top-tier general capabilities and achieves a record-low 0.6% hallucination rate (HHEM). Our results demonstrate that rigorous domain specialization can be achieved without compromising general intelligence.
Problem

Research questions and friction points this paper is trying to address.

insurance
hallucination control
domain mastery
competence trade-off
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Verifiable Data Synthesis
Progressive SFT-RL Curriculum
Hallucination Control
Domain Specialization without Trade-offs
Insurance LLM
🔎 Similar Papers
No similar papers found.
Q
Qian Zhu
Ant Group
X
Xinnan Guo
Ant Group
J
Jingjing Huo
Ant Group
J
Jun Li
Ant Group
P
Pan Liu
Ant Group
Wenyan Yang
Wenyan Yang
Aalto University
Computer VisionImitation LearningReinforcement Learning
W
Wanqing Xu
Ant Group
Xuan Lin
Xuan Lin
Georgia Institute of Technology
RoboticsManipulationTask and Motion PlanningOptimization and Control