Pap2Pat: Benchmarking Outline-Guided Long-Text Patent Generation with Patent-Paper Pairs

📅 2024-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the underexplored challenge of generating long-form, technically rigorous patent descriptions using large language models (LLMs). We introduce PAP2PAT—the first open-source, real-world patent generation benchmark comprising 1.8k patent–paper pairs—designed for outline-guided, chunked generation from academic papers as invention disclosures. Our method proposes an outline-guided chunk generation paradigm, integrating chunked prompt engineering, zero-shot and fine-tuned generation, and multi-dimensional automatic evaluation complemented by human analysis. Experiments uncover a fundamental trade-off in LLMs between patent detail fidelity and factual hallucination: fine-tuning improves stylistic adaptation to patent conventions but exacerbates factual hallucinations. We fully open-source the benchmark data, implementation code, and evaluation framework, establishing a reproducible foundation for research on intelligent patent generation.

Technology Category

Application Category

📝 Abstract
Dealing with long and highly complex technical text is a challenge for Large Language Models (LLMs), which still have to unfold their potential in supporting expensive and timeintensive processes like patent drafting. Within patents, the description constitutes more than 90% of the document on average. Yet, its automatic generation remains understudied. When drafting patent applications, patent attorneys typically receive invention reports (IRs), which are usually confidential, hindering research on LLM-supported patent drafting. Often, prepublication research papers serve as IRs. We leverage this duality to build PAP2PAT, an open and realistic benchmark for patent drafting consisting of 1.8k patent-paper pairs describing the same inventions. To address the complex longdocument patent generation task, we propose chunk-based outline-guided generation using the research paper as invention specification. Our extensive evaluation using PAP2PAT and a human case study show that LLMs can effectively leverage information from the paper, but still struggle to provide the necessary level of detail. Fine-tuning leads to more patent-style language, but also to more hallucination. We release our data and code https://github.com/boschresearch/Pap2Pat.
Problem

Research questions and friction points this paper is trying to address.

Challenges in generating long, complex patent texts using LLMs.
Lack of open benchmarks for patent drafting with LLMs.
Difficulty in maintaining detail and accuracy in patent-style language.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chunk-based outline-guided patent generation
Leveraging patent-paper pairs for benchmarking
Fine-tuning LLMs for patent-style language
🔎 Similar Papers
2024-06-27North American Chapter of the Association for Computational LinguisticsCitations: 5
2024-03-06Artificial Intelligence ReviewCitations: 0