Towards Better Evaluation for Generated Patent Claims

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Existing automatic evaluation metrics exhibit significant discrepancies with patent expert judgments, undermining the reliability of generative claim quality assessment. To address this, we introduce Patent-CE—the first benchmark dataset annotated by domain-expert patent attorneys—and propose PatClaimEval, a dedicated multidimensional evaluation framework. PatClaimEval is the first to explicitly define and quantify five core claim dimensions: feature completeness, conceptual clarity, terminology consistency, logical coherence, and holistic quality. It integrates multi-granularity semantic modeling with structured linguistic rules to achieve expert-aligned assessment. Experimental results demonstrate that PatClaimEval achieves the highest human–machine agreement across all five dimensions (average Spearman ρ = 0.82), substantially outperforming general-purpose metrics such as BLEU, ROUGE, and BERTScore. This work establishes an interpretable, reproducible paradigm for evaluating generative patent text.

Technology Category

Application Category

📝 Abstract

Patent claims define the scope of protection and establish the legal boundaries of an invention. Drafting these claims is a complex and time-consuming process that usually requires the expertise of skilled patent attorneys, which can form a large access barrier for many small enterprises. To solve these challenges, researchers have investigated the use of large language models (LLMs) for automating patent claim generation. However, existing studies highlight inconsistencies between automated evaluation metrics and human expert assessments. To bridge this gap, we introduce Patent-CE, the first comprehensive benchmark for evaluating patent claims. Patent-CE includes comparative claim evaluations annotated by patent experts, focusing on five key criteria: feature completeness, conceptual clarity, terminology consistency, logical linkage, and overall quality. Additionally, we propose PatClaimEval, a novel multi-dimensional evaluation method specifically designed for patent claims. Our experiments demonstrate that PatClaimEval achieves the highest correlation with human expert evaluations across all assessment criteria among all tested metrics. This research provides the groundwork for more accurate evaluations of automated patent claim generation systems.

Problem

Research questions and friction points this paper is trying to address.

Automating patent claim generation using LLMs lacks reliable evaluation metrics.

Inconsistencies exist between automated metrics and human expert assessments.

Need for a comprehensive benchmark to evaluate patent claim quality.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Patent-CE benchmark for patent claims evaluation

Proposes PatClaimEval multi-dimensional evaluation method

Uses large language models for automated claim generation

🔎 Similar Papers

Can Large Language Models Generate High-quality Patent Claims?