🤖 AI Summary
Existing automatic evaluation metrics exhibit significant discrepancies with patent expert judgments, undermining the reliability of generative claim quality assessment. To address this, we introduce Patent-CE—the first benchmark dataset annotated by domain-expert patent attorneys—and propose PatClaimEval, a dedicated multidimensional evaluation framework. PatClaimEval is the first to explicitly define and quantify five core claim dimensions: feature completeness, conceptual clarity, terminology consistency, logical coherence, and holistic quality. It integrates multi-granularity semantic modeling with structured linguistic rules to achieve expert-aligned assessment. Experimental results demonstrate that PatClaimEval achieves the highest human–machine agreement across all five dimensions (average Spearman ρ = 0.82), substantially outperforming general-purpose metrics such as BLEU, ROUGE, and BERTScore. This work establishes an interpretable, reproducible paradigm for evaluating generative patent text.
📝 Abstract
Patent claims define the scope of protection and establish the legal boundaries of an invention. Drafting these claims is a complex and time-consuming process that usually requires the expertise of skilled patent attorneys, which can form a large access barrier for many small enterprises. To solve these challenges, researchers have investigated the use of large language models (LLMs) for automating patent claim generation. However, existing studies highlight inconsistencies between automated evaluation metrics and human expert assessments. To bridge this gap, we introduce Patent-CE, the first comprehensive benchmark for evaluating patent claims. Patent-CE includes comparative claim evaluations annotated by patent experts, focusing on five key criteria: feature completeness, conceptual clarity, terminology consistency, logical linkage, and overall quality. Additionally, we propose PatClaimEval, a novel multi-dimensional evaluation method specifically designed for patent claims. Our experiments demonstrate that PatClaimEval achieves the highest correlation with human expert evaluations across all assessment criteria among all tested metrics. This research provides the groundwork for more accurate evaluations of automated patent claim generation systems.