Patent classification is the most important phase for analyzing different technological trends. Patents are classified according to different parameters or different technological features present. It is to be noted that a single patent may disclose more than one technological feature and therefore, a patent can be classified more than once.
A technology classification/taxonomy can be defined before the start of the classification process or a technology classification/taxonomy is built progressively as patents are classified one by one.
Depending upon the requirements and the scope of the project, different patent classification methods can be used to provide different insights into the patenting strategies and market dominance of one’s competitors.
- Class-based bucketing: Based on a pre-defined taxonomy (according to the technology sub-domains of the product/service line of the client), different relevant IPC/CPC classes are identified. This class-based classification is then replicated onto the extracted patent dataset, thus classifying the patents into various categories and sub-categories.
For example: Let us consider Wireless and Broadcast communication technology (broad category). Various relevant classes can be identified such as H04W and H04H, and then, on basis of the definitions of its various child classes, they are put into different buckets (based on the taxonomy) representing different sub-categories such as
H04W80/00: Wireless network protocols
H04W88/00: Devices for wireless networks
H04W40/00: Communication routing
H04H20/00: Arrangements for broadcast
H04H2201/00: Aspects of broadcast communication
Publications having these classes are hence bucketed. In Class-based bucketing, a patent may be bucketed multiple times into different categories. The accuracy achieved in class-based bucketing is moderate and the time required is less to moderate depending on the number of categories present in the taxonomy.
- String-based bucketing: Based on a pre-defined taxonomy, strings are formed for each of the sub-categories using keywords specific to that domain and their linguistic synonyms. By analyzing hits of the strings, the strings are refined to avoid any noise that may come. After a few iterations, the desired dataset for a sub-category of technology is obtained.
For example, The string for Wireless and Broadcast communication can be as follows:
ALL=(((wireless OR broadcast) NEAR5 communicat*) OR (wireless NEAR5 protocol*) OR ((antenna OR radio) NEAR3 construct*) OR (remote OR distant OR tele* OR online)) OR (Communicat* NEAR3 (rout* OR path)) OR ((frequency OR amplitude) NEAR3 modulat*) OR (transceiver OR receiver OR transmitter) OR (base station))
Though the strings are very specific, a small percentage of the publications might go undetected by the search algorithm of the database because of linguistic barriers (since translations of some of the non-English publications might not be available). Due to this, keywords+ class-based strings are formed, giving optimized and reliable results. Hence, the patent portfolio is classified into various categories. Its accuracy is slightly less than that of class-based bucketing, but the time needed to invest in it is the same.
- Manual Bucketing: Each patent in the dataset is analyzed thoroughly by experienced researchers. Depending on the type of invention and the key features that the publication discloses, it is classified into one of the categories, according to the pre-defined taxonomy or a taxonomy that gets build up during the manual analysis process. When compared with the above two methods of bucketing, manual bucketing has the highest accuracy (human intelligence being the contributing factor) as well as it takes most of the time.
- Automated Patent Classification using the NLP model: The adoption of NLP and AI-based auto-classification of patents has been sporadic. Automation for patent classification not only helps to reduce human error but also accelerates the classification process. Keywords and synonyms are identified pertaining to specific sub-categories (according to the pre-defined taxonomy) and are fed into the Natural Language Processing model for context analysis and lexical semantics to determine the central idea behind the invention. Hence classifying the patent portfolio into different categories. The accuracy achieved using such a model is moderate and the time required is less.
Choosing one of the above-mentioned classification methods depends on the size of the portfolio, the accuracy needed and the time allotted to the project along with the budget of the client for competitive benchmarking and hence the resources (number of people) allotted to the project are decided accordingly. In the case of highest accuracy, we need manual analysis as even NLP is not enough to do that. NLP can be used for a helicopter view of the overall portfolio.
Depending upon customer needs, At ResearchWire Knowledge Solutions, we follow a strong methodology and robust process to evaluate patent data and deliver what the client requires. ResearchWire Team consists of experienced Patent and Data Analysts who come from different industries. We understand the client requirements well and deliver useful insights using advanced data visualization tools to make the client’s decision-making process more effective and easier.