IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property

📅 2025-04-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing IP-domain evaluation benchmarks suffer from narrow coverage and unrealistic task scenarios, hindering comprehensive assessment of large language models (LLMs) on law-technology interdisciplinary tasks. Method: We introduce IPBench—the first comprehensive bilingual IP evaluation benchmark—covering eight IP mechanisms and twenty realistic tasks, underpinned by a novel systematic IP task taxonomy. The dataset is constructed via expert-curated annotation and rigorous domain-expert validation, supporting zero-shot and few-shot evaluation. Contribution/Results: Experiments across 16 state-of-the-art LLMs reveal a top accuracy of only 75.8%; open-source IP-specialized models substantially underperform general-purpose closed-source models. To foster trustworthy AI research in IP, we fully open-source the dataset, evaluation code, and protocols—with ongoing community-driven updates.

Technology Category

Application Category

📝 Abstract
Intellectual Property (IP) is a unique domain that integrates technical and legal knowledge, making it inherently complex and knowledge-intensive. As large language models (LLMs) continue to advance, they show great potential for processing IP tasks, enabling more efficient analysis, understanding, and generation of IP-related content. However, existing datasets and benchmarks either focus narrowly on patents or cover limited aspects of the IP field, lacking alignment with real-world scenarios. To bridge this gap, we introduce the first comprehensive IP task taxonomy and a large, diverse bilingual benchmark, IPBench, covering 8 IP mechanisms and 20 tasks. This benchmark is designed to evaluate LLMs in real-world intellectual property applications, encompassing both understanding and generation. We benchmark 16 LLMs, ranging from general-purpose to domain-specific models, and find that even the best-performing model achieves only 75.8% accuracy, revealing substantial room for improvement. Notably, open-source IP and law-oriented models lag behind closed-source general-purpose models. We publicly release all data and code of IPBench and will continue to update it with additional IP-related tasks to better reflect real-world challenges in the intellectual property domain.
Problem

Research questions and friction points this paper is trying to address.

Lack of comprehensive benchmarks for LLMs in IP domain
Existing datasets misalign with real-world IP scenarios
Need to evaluate LLMs' performance on diverse IP tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces comprehensive IP task taxonomy
Develops bilingual benchmark IPBench
Evaluates 16 LLMs on IP tasks
🔎 Similar Papers
No similar papers found.
Qiyao Wang
Qiyao Wang
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Natural Language ProcessingLarge Language ModelsAgentic AIPatent ProcessingAI for IP
Guhong Chen
Guhong Chen
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
LLM、NLP
H
Hongbo Wang
Dalian University of Technology, China
H
Huaren Liu
Dalian University of Technology, China
Minghui Zhu
Minghui Zhu
Professor, Electrical Engineering and Computer Science, Pennsylvania State University
Systems and ControlMulti-agent SystemsRobotic NetworksCyber-physical Systems
Z
Zhifei Qin
Dalian University of Technology, China
L
Linwei Li
Dalian University of Technology, China
Y
Yilin Yue
Dalian University of Technology, China
Shiqiang Wang
Shiqiang Wang
IBM T. J. Watson Research Center
Agentic AICollaborative & Federated AILLMsMachine LearningOptimization Algorithms
J
Jiayan Li
Dalian University of Technology, China
Y
Yihang Wu
Dalian University of Technology, China
Ziqiang Liu
Ziqiang Liu
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Natural Language ProcessingLarge Language Model
Longze Chen
Longze Chen
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Natural Language Processing
Run Luo
Run Luo
University of Chinese Academy of Sciences
text&video&audio pretrainingvlmvlarl3dv
L
Liyang Fan
Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, China
J
Jiaming Li
Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, China
L
Lei Zhang
Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, China
Kan Xu
Kan Xu
University of Rochester
3-D ICsPower Distribution NetworkOn-chip Power NoiseFinFET
H
Hong-Wei Lin
Dalian University of Technology, China
Hamid Alinejad-Rokny
Hamid Alinejad-Rokny
ARC DECRA & UNSW Scientia Fellow, Head of BioMedical Machine Learning Lab
BioMedical Machine LearningMachine Learning for HealthMedical Artificial IntelligenceLLMs
S
Shiwen Ni
Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, China
Yuan Lin
Yuan Lin
Ocean College, Zhejiang University
RheologyPolymer physcisMulti-phase flow
Min Yang
Min Yang
Bytedance
Vision Language ModelComputer VisionVideo Understanding