Finding prior arts for a non-English patent

Finding prior art can be like looking for a needle in a haystack. Further, when one has to do it to find a prior art for non-English language patent, it becomes a bigger challenge.

Although we may find prior arts for non-English patents in an English patent, though it is always recommended to do an exhaustive search in the same language (non-English) patents as the possibility of finding linguistic synonyms and their equivalents becomes higher.

In general, the most Important non-English patent jurisdictions are China, Japan, Korea, Taiwan, Singapore, Germany and France. These countries publish high-end cutting-edge research literature on a huge scale.

Generally, the databases which are used for getting English translation for non-English prior art searches are Google Patents, Derwent Innovation, Questel Orbit Espacenet. As the English translations from different databases can be different in their sentence and word choices, it helps to read through all the versions and compare them with one another, giving us insights as well as some key keywords related to the idea of the invention.

For example: Translation of a Japanese publication (JP2016008417A) claims from Google patents is as below;

The same patent when translated through Espacenet gave us claims as;

Translation of the same from PAJ  showed different text with different sentence formation and the keywords as below.

In all of the three cases, the preamble of the Japanese publication is different since all the three different databases use their own translation engines.

Google Patents Espacenet PAJ
A latch mechanism that detachably locks the striker and has an operation receiving portion for releasing the lock of the striker; A latch mechanism having an operation receiving portion for releasably locking the striker and releasing the lock of the striker A latch mechanism having an operation receiving part for detachably locking the striker and releasing the lock of the striker


Also, the synonym of detachable is mentioned by Espacenet translation i.e. releasable. In the same fashion, other keywords and linguistic synonyms can be found. These synonyms therefore, can be used for performing a more exhaustive prior art searching.

Different methods which can be deployed to effectively find prior arts for non-English patents are

  1. When a relevant result has been identified, its citations are checked to identify more relevant prior arts. It happens that more relevant prior art is found, then it is checked why the prior art did not showed up in the initial search. It may be due to non-inclusion of relevant keywords or technology classes. Therefore, an iterative process can be deployed wherein keywords and technology classes are updated simultaneously.
  2. We should also use reverse search string strategy can be utilized i.e. converting the search strings into the language of the subject patent using translators and then searching them on the local language database or normal databases such as Google itself. For example,
    1. is one of the engines used for prior art search in the Japanese language,
    2. is the State Intellectual Property Office website of China,
    3. is a database for Chinese Academic Journals,
    4. is the official German register for patents, utility models, trademarks and designs.

The results obtained from such non-native language databases are then translated to English, for better understanding and comparison. Thus, the search is comprehensive, leaving no corner unscanned, yielding the best results.

  1. Further, in case the patent has images that clearly depicts the novelty of invention, then image analysis should be done as an important sub-part of prior art search process as images overcome any linguistic variations and interpretations.
  2. In any case, associates must be trained in specific native languages so that they can perform prior art searches in local language databases effectively.


Searching for prior arts in non-English language has its own challenges, though the process becomes smooth, efficient and fast with the combination of the right databases, experienced researchers and some smart tips and tricks. Nonetheless it is always better to have researchers who can deploy the strategies along with having proficiency in the language in which the patent is filed.


We at ResearchWire Knowledge Solutions, have associates that have proficiency in different languages and they deploy various search strategies to find most promising prior arts for any non-English language patent.

Read more

Different Patent Classification Techniques: When to use What!!

Patent classification is the most important phase for analyzing different technological trends. Patents are classified according to different parameters or different technological features present. It is to be noted that a single patent may disclose more than one technological feature and therefore, a patent can be classified more than once.

A technology classification/taxonomy can be defined before the start of the classification process or a technology classification/taxonomy is built progressively as patents are classified one by one.

Depending upon the requirements and the scope of the project, different patent classification methods can be used to provide different insights into the patenting strategies and market dominance of one’s competitors.

  1. Class-based bucketing: Based on a pre-defined taxonomy (according to the technology sub-domains of the product/service line of the client), different relevant IPC/CPC classes are identified. This class-based classification is then replicated onto the extracted patent dataset, thus classifying the patents into various categories and sub-categories.

For example: Let us consider Wireless and Broadcast communication technology (broad category). Various relevant classes can be identified such as H04W and H04H, and then, on basis of the definitions of its various child classes, they are put into different buckets (based on the taxonomy) representing different sub-categories such as

H04W80/00: Wireless network protocols

H04W88/00: Devices for wireless networks

H04W40/00: Communication routing

H04H20/00: Arrangements for broadcast

H04H2201/00: Aspects of broadcast communication

Publications having these classes are hence bucketed. In Class-based bucketing, a patent may be bucketed multiple times into different categories. The accuracy achieved in class-based bucketing is moderate and the time required is less to moderate depending on the number of categories present in the taxonomy.

  1. String-based bucketing: Based on a pre-defined taxonomy, strings are formed for each of the sub-categories using keywords specific to that domain and their linguistic synonyms. By analyzing hits of the strings, the strings are refined to avoid any noise that may come. After a few iterations, the desired dataset for a sub-category of technology is obtained.

For example, The string for Wireless and Broadcast communication can be as follows:

ALL=(((wireless OR broadcast) NEAR5 communicat*) OR (wireless NEAR5 protocol*) OR ((antenna OR radio) NEAR3 construct*) OR (remote OR distant OR tele* OR online)) OR (Communicat* NEAR3 (rout* OR path)) OR ((frequency OR amplitude) NEAR3 modulat*) OR (transceiver OR receiver OR transmitter) OR (base station))


Though the strings are very specific, a small percentage of the publications might go undetected by the search algorithm of the database because of linguistic barriers (since translations of some of the non-English publications might not be available). Due to this, keywords+ class-based strings are formed, giving optimized and reliable results. Hence, the patent portfolio is classified into various categories. Its accuracy is slightly less than that of class-based bucketing, but the time needed to invest in it is the same.


  1. Manual Bucketing: Each patent in the dataset is analyzed thoroughly by experienced researchers. Depending on the type of invention and the key features that the publication discloses, it is classified into one of the categories, according to the pre-defined taxonomy or a taxonomy that gets build up during the manual analysis process. When compared with the above two methods of bucketing, manual bucketing has the highest accuracy (human intelligence being the contributing factor) as well as it takes most of the time.


  1. Automated Patent Classification using the NLP model: The adoption of NLP and AI-based auto-classification of patents has been sporadic. Automation for patent classification not only helps to reduce human error but also accelerates the classification process. Keywords and synonyms are identified pertaining to specific sub-categories (according to the pre-defined taxonomy) and are fed into the Natural Language Processing model for context analysis and lexical semantics to determine the central idea behind the invention. Hence classifying the patent portfolio into different categories. The accuracy achieved using such a model is moderate and the time required is less.


Choosing one of the above-mentioned classification methods depends on the size of the portfolio, the accuracy needed and the time allotted to the project along with the budget of the client for competitive benchmarking and hence the resources (number of people) allotted to the project are decided accordingly. In the case of highest accuracy, we need manual analysis as even NLP is not enough to do that. NLP can be used for a helicopter view of the overall portfolio.

Depending upon customer needs, At ResearchWire Knowledge Solutions, we follow a strong methodology and robust process to evaluate patent data and deliver what the client requires. ResearchWire Team consists of experienced Patent and Data Analysts who come from different industries. We understand the client requirements well and deliver useful insights using advanced data visualization tools to make the client’s decision-making process more effective and easier.

Read more

Whitespace analysis: A smart step towards research, innovation and securing patent rights

How to find whitespaces

Recently during a client discussion about White space analysis where it was pointed that in his technology domain, not many companies file for the patents. Therefore, it is inaccurate to find whitespaces just by analyzing patent data set.

So, the question was, what could be the smart way to find whitespaces in such cases.

Before jumping straight to the answer, let’s look at what the whitespace analysis actually is and how is it important for any company or organization to capture the market ahead of their competitors.

Whitespace analysis helps to identify overcrowded and sparse areas in a technology domain. It helps in identifying new opportunities for innovation in less competitive areas.

How to go about the whitespace analysis         

For any whitespace analysis, a scope is defined in terms of what is expected from the whitespace analysis. For e.g. whitespaces can be identified in terms of



Material, etc.

Similarly, many other parameters can be selected. After the scope is defined, relevant patents are identified using combination of keywords and classes. All the patents are analyzed then according to parameters defined. Patent classification is done according to different methods. Generally, after the classification is done, the areas with a smaller number of patent filings are considered as whitespaces.

But this approach may not give a full-proof idea about the whitespaces due to many reasons

  1. It is not necessary that company files for patent in a particular technology domain
  2. It may happen that technology is old enough and therefore the patents could not be captured into the dataset due to date restriction.

Therefore, a 360-degree analysis is needed to shortlist the whitespaces. Apart from the patents, it is important to look into the Non-patent literature which includes both the research papers and products available in the market for the related technology domain.

To answer the client’s question, we suggested that Whitespaces can’t be decided on the basis of the number of filed patents only. The inclusion of other literature is also necessary such as existing products, research work, etc. in the technology domain.


Read more