DesignCLIP: Multimodal Learning with CLIP for Design Patent Understanding

📅 2025-08-21

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Design patents exhibit high abstraction and sparse semantic content in imagery, leading to semantic ambiguity in classification and retrieval tasks and thereby compromising the accuracy of prior-art assessment. To address this, we propose DesignCLIP—the first unified multimodal representation framework tailored for design patents. Built upon the CLIP architecture, it integrates automatically generated fine-grained textual descriptions with multi-view image modeling, and introduces a novel class-aware contrastive learning strategy. Moreover, we construct the first large-scale generative image-text pair dataset specifically for patent representation enhancement. DesignCLIP enables cross-modal patent retrieval and classification, achieving significant performance gains over state-of-the-art methods across multiple benchmarks. Experimental results validate the effectiveness and advancement of multimodal co-modeling for deep semantic understanding of design patents.

Technology Category

Application Category

📝 Abstract

In the field of design patent analysis, traditional tasks such as patent classification and patent image retrieval heavily depend on the image data. However, patent images -- typically consisting of sketches with abstract and structural elements of an invention -- often fall short in conveying comprehensive visual context and semantic information. This inadequacy can lead to ambiguities in evaluation during prior art searches. Recent advancements in vision-language models, such as CLIP, offer promising opportunities for more reliable and accurate AI-driven patent analysis. In this work, we leverage CLIP models to develop a unified framework DesignCLIP for design patent applications with a large-scale dataset of U.S. design patents. To address the unique characteristics of patent data, DesignCLIP incorporates class-aware classification and contrastive learning, utilizing generated detailed captions for patent images and multi-views image learning. We validate the effectiveness of DesignCLIP across various downstream tasks, including patent classification and patent retrieval. Additionally, we explore multimodal patent retrieval, which provides the potential to enhance creativity and innovation in design by offering more diverse sources of inspiration. Our experiments show that DesignCLIP consistently outperforms baseline and SOTA models in the patent domain on all tasks. Our findings underscore the promise of multimodal approaches in advancing patent analysis. The codebase is available here: https://anonymous.4open.science/r/PATENTCLIP-4661/README.md.

Problem

Research questions and friction points this paper is trying to address.

Addressing ambiguities in design patent analysis using multimodal learning

Improving patent classification and retrieval with CLIP integration

Enhancing semantic understanding of abstract patent sketches through captions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging CLIP models for unified patent framework

Incorporating class-aware classification and contrastive learning

Utilizing generated captions and multi-view image learning

🔎 Similar Papers

No similar papers found.