PatentScore: Multi-dimensional Evaluation of LLM-Generated Patent Claims

📅 2025-05-25

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Existing natural language generation (NLG) evaluation metrics inadequately assess patent claims across three critical dimensions: legal validity, technical accuracy, and structural compliance. Method: We propose the first multidimensional automated evaluation framework specifically designed for patent claims, integrating hierarchical claim decomposition, dual-standard (legal/technical) verification, and a tri-dimensional scoring system (structural, semantic, legal). Our approach combines rule-based structural compliance checking, semantic alignment leveraging a domain-specific patent knowledge graph, legal validity pattern matching, and expert-annotated, multi-dimensional weighted scoring. Contribution/Results: Evaluated on 400 Claim 1 statements generated by GPT-4o-mini, our framework achieves a Pearson correlation coefficient of 0.819 with human expert judgments—significantly outperforming state-of-the-art NLG metrics. It further demonstrates strong generalizability across diverse LLMs, including Claude-3.5-Haiku and Gemini-1.5-flash. This work establishes a foundational benchmark for rigorous, automated patent claim assessment.

Technology Category

Application Category

📝 Abstract

Natural language generation (NLG) metrics play a central role in evaluating generated texts, but are not well suited for the structural and legal characteristics of patent documents. Large language models (LLMs) offer strong potential in automating patent generation, yet research on evaluating LLM-generated patents remains limited, especially in evaluating the generation quality of patent claims, which are central to defining the scope of protection. Effective claim evaluation requires addressing legal validity, technical accuracy, and structural compliance. To address this gap, we introduce PatentScore, a multi-dimensional evaluation framework for assessing LLM-generated patent claims. PatentScore incorporates: (1) hierarchical decomposition for claim analysis; (2) domain-specific validation patterns based on legal and technical standards; and (3) scoring across structural, semantic, and legal dimensions. Unlike general-purpose NLG metrics, PatentScore reflects patent-specific constraints and document structures, enabling evaluation beyond surface similarity. We evaluate 400 GPT-4o-mini generated Claim 1s and report a Pearson correlation of $r = 0.819$ with expert annotations, outperforming existing NLG metrics. Furthermore, we conduct additional evaluations using open models such as Claude-3.5-Haiku and Gemini-1.5-flash, all of which show strong correlations with expert judgments, confirming the robustness and generalizability of our framework.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM-generated patent claims lacks suitable metrics

Assessing legal validity and technical accuracy in patents

Developing a patent-specific multi-dimensional evaluation framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical decomposition for claim analysis

Domain-specific legal and technical validation

Multi-dimensional structural, semantic, legal scoring

🔎 Similar Papers

Can Large Language Models Generate High-quality Patent Claims?