PatentScore: Multi-dimensional Evaluation of LLM-Generated Patent Claims

📅 2025-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing natural language generation (NLG) evaluation metrics inadequately assess patent claims across three critical dimensions: legal validity, technical accuracy, and structural compliance. Method: We propose the first multidimensional automated evaluation framework specifically designed for patent claims, integrating hierarchical claim decomposition, dual-standard (legal/technical) verification, and a tri-dimensional scoring system (structural, semantic, legal). Our approach combines rule-based structural compliance checking, semantic alignment leveraging a domain-specific patent knowledge graph, legal validity pattern matching, and expert-annotated, multi-dimensional weighted scoring. Contribution/Results: Evaluated on 400 Claim 1 statements generated by GPT-4o-mini, our framework achieves a Pearson correlation coefficient of 0.819 with human expert judgments—significantly outperforming state-of-the-art NLG metrics. It further demonstrates strong generalizability across diverse LLMs, including Claude-3.5-Haiku and Gemini-1.5-flash. This work establishes a foundational benchmark for rigorous, automated patent claim assessment.

Technology Category

Application Category

📝 Abstract
Natural language generation (NLG) metrics play a central role in evaluating generated texts, but are not well suited for the structural and legal characteristics of patent documents. Large language models (LLMs) offer strong potential in automating patent generation, yet research on evaluating LLM-generated patents remains limited, especially in evaluating the generation quality of patent claims, which are central to defining the scope of protection. Effective claim evaluation requires addressing legal validity, technical accuracy, and structural compliance. To address this gap, we introduce PatentScore, a multi-dimensional evaluation framework for assessing LLM-generated patent claims. PatentScore incorporates: (1) hierarchical decomposition for claim analysis; (2) domain-specific validation patterns based on legal and technical standards; and (3) scoring across structural, semantic, and legal dimensions. Unlike general-purpose NLG metrics, PatentScore reflects patent-specific constraints and document structures, enabling evaluation beyond surface similarity. We evaluate 400 GPT-4o-mini generated Claim 1s and report a Pearson correlation of $r = 0.819$ with expert annotations, outperforming existing NLG metrics. Furthermore, we conduct additional evaluations using open models such as Claude-3.5-Haiku and Gemini-1.5-flash, all of which show strong correlations with expert judgments, confirming the robustness and generalizability of our framework.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM-generated patent claims lacks suitable metrics
Assessing legal validity and technical accuracy in patents
Developing a patent-specific multi-dimensional evaluation framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical decomposition for claim analysis
Domain-specific legal and technical validation
Multi-dimensional structural, semantic, legal scoring
🔎 Similar Papers
No similar papers found.