🤖 AI Summary
This study addresses the labor-intensive nature of patent abstract generation and the lack of systematic evaluation methodologies. To this end, we propose the first unified evaluation framework specifically designed for patent abstract generation. Methodologically, we construct a multi-dimensional benchmark assessing language quality, technical accuracy, and downstream task adaptability; leverage state-of-the-art large language models—including GPT-4 and LLaMA-3—under zero-shot, few-shot, and chain-of-thought prompting paradigms; and integrate automated metrics (BLEU, ROUGE, BERTScore). Experimental results demonstrate that general-purpose LLMs significantly outperform domain-specific baselines in both factual fidelity and patent-style adherence. To foster reproducibility and advancement, we publicly release all code and datasets. This work establishes a rigorous, standardized, and practical evaluation foundation for LLM-driven intelligent patent drafting.
📝 Abstract
Large language models (LLMs) have emerged as transformative approaches in several important fields. This paper aims for a paradigm shift for patent writing by leveraging LLMs to overcome the tedious patent-filing process. In this work, we present PATENTWRITER, the first unified benchmarking framework for evaluating LLMs in patent abstract generation. Given the first claim of a patent, we evaluate six leading LLMs -- including GPT-4 and LLaMA-3 -- under a consistent setup spanning zero-shot, few-shot, and chain-of-thought prompting strategies to generate the abstract of the patent. Our benchmark PATENTWRITER goes beyond surface-level evaluation: we systematically assess the output quality using a comprehensive suite of metrics -- standard NLP measures (e.g., BLEU, ROUGE, BERTScore), robustness under three types of input perturbations, and applicability in two downstream patent classification and retrieval tasks. We also conduct stylistic analysis to assess length, readability, and tone. Experimental results show that modern LLMs can generate high-fidelity and stylistically appropriate patent abstracts, often surpassing domain-specific baselines. Our code and dataset are open-sourced to support reproducibility and future research.