🤖 AI Summary
This study addresses the issue of identification bias in green patent classification. We propose a natural language processing (NLP) method leveraging text-vector–enhanced lexicons to conduct fine-grained semantic analysis and greenness assessment of official patent texts. Results reveal that only approximately 20% of patents classified as “green” under conventional schemes possess substantive environmental value; after excluding “pseudo-green” patents, the economic returns to firms’ authentic green innovation are significantly underestimated. Furthermore, highly novel green patents exhibit robust positive effects on firm profitability, sales growth, market share, and total factor productivity—effects markedly stronger than those of conventional green patents. Our approach provides a more precise measurement framework for green innovation performance evaluation and delivers rigorous empirical evidence to inform policy and scholarly research on sustainable technological change.
📝 Abstract
This paper introduces Natural Language Processing for identifying ``true'' green patents from official supporting documents. We start our training on about 12.4 million patents that had been classified as green from previous literature. Thus, we train a simple neural network to enlarge a baseline dictionary through vector representations of expressions related to environmental technologies. After testing, we find that ``true'' green patents represent about 20% of the total of patents classified as green from previous literature. We show heterogeneity by technological classes, and then check that `true' green patents are about 1% less cited by following inventions. In the second part of the paper, we test the relationship between patenting and a dashboard of firm-level financial accounts in the European Union. After controlling for reverse causality, we show that holding at least one ``true'' green patent raises sales, market shares, and productivity. If we restrict the analysis to high-novelty ``true'' green patents, we find that they also yield higher profits. Our findings underscore the importance of using text analyses to gauge finer-grained patent classifications that are useful for policymaking in different domains.