Document Classification: Automated Legal Patent Rejection

Business Challenge

As the number of patent applications continue to rise over the years, so do the number of patent claims. The patent holder gains a significant competitive advantage with a successful patent registration. Firms receive hundreds of claims from patent holder asserting their ownership over the patent has been infringed. Reading and processing these documents takes upwards of tens of thousands of potential billable hours. It is in the best interest of firms to pursue the claims that are likely to result in a profit and set aside those that will not. Therefore, creating an algorithm that will accurately predict the outcome of legal patent applications automatically would save millions of dollars in opportunity cost.

 

SFL's Approach

SFL-Scientific’s approach was to process the intellectual property claims and deliver them ranked to the firm. The claims were converted into a digital format through the method of optical character recognition. Inconsistencies and erroneous conversions are corrected manually before the text is analyzed. Using certain methods such as the bag-of-words model, term frequency-inverse document frequency and n-grams, the words in the documents were classified based on their textual features. Certain types of intellectual property patents experience a disproportionate amount of litigation and these algorithms can identify them. The machine learning model is created on textual features by training it on previously filed patents with known outcomes. This model provides a highly accurate prediction for the probability of a success of any particular claim.

 

Business Value

The business value is substantial for the firm. The need for lawyers to read and process the intellectual property claims is reduced if not entirely eliminated. The firm gain tens of thousands of potential billable hours for their lawyers. This sorted ranking of claims provides a platform for making strategic decisions on which ones to take on. Favorable claims appear higher in the list, whereas claims with a ranking below a certain threshold are automatically rejected. The metrics for deciding the success of a claim are built upon thousands of previous claims and highly advanced algorithms, with the probability of a claims success is considerably more accurate than a manual reading of the document.