Pay Equality Modelling

Business Challenge

Wage gaps between gender and ethnic groups receive a lot of media coverage and are a major issue in our society. Identifying and addressing them effectively and accurately shields companies from lawsuits and bad publicity, whilst promoting equality and fairness. For large organizations it can be difficult to identify these gaps as they are spread across thousands or tens of thousands of employees in a variety of roles, locations, and work experiences. Analyzing this data correctly is important as surface level trends don’t always tell an accurate story at the individual level.

 

SFL’s Approach

SFL addresses this issue by developing a predictive model based on one specific demographic based on all possible attributes except the demographic split (job history, location, experience, position). This model is first validated and tuned such that it successfully and accurately predicted salary within that demographic. We then apply this model to other demographic groups in order to see predicted expected salaries. A gap in wages would be evident if the model trained on one race or gender did a poor job of predicting salaries for another race or gender. The discrepancy between the model prediction and actual salary was used to determine the per individual wage gap. This approach also provides a higher level of resolution than simply fitting a model over the whole dataset as it allows identification and quantification of specific groups or types of employees as being unfairly treated.

 

Business Value

Almost every company can benefit from this type of analysis. Companies gain highly granular information, down to the individual level, on how their minority employees are being paid relative to other groups. This allows them to address this issues accurately and specifically. Problems that are localized to certain demographics, or certain roles, or employees of certain demographics in certain roles can also be identified. Companies can then correct these issues or if no issues are found be confident that they are doing the right thing. Similar types of modelling can be performed in a vast variety of domains using employee data. This may include skills mappings, optimizing internal transfers and numerous HR related projects.

 

 

Market Segmentation

Business Challenge

Market segmentation is a typical analysis that allows companies to better understand their customer base. By broadly dividing a company's customers into segments, a company can understand and market to particular varieties of customers. 

According to the Bureau of Labor Statistics, California has approximately 3,700 food and beverage manufacturers in 2015.[1] It is highly competitive in this industry, that manufacturers have to face to issues of boosting sales and increasing customer loyalty. One of our worldwide beverage provider customers needed to segment market to determine the national market. They have multiple flavors and sizes of the beverage, but they didn’t know how to allocate different kinds of beverage to different market to increase sales.

 

SFLs Approach

SFL worked with customer to build a hierarchal clustering model to classify customers and market with the data of purchase history and demographics, such as age, sex, education level, income level, occupation and so on. SFL used hierarchal customer segmentation with daisy and flashclust method in R, which is a small but fast implementation of hierarchal clustering.

 

Business Value

The hierarchal analysis allowed the beverage provider have a better understanding of their customers demographics and target markets. For example, in developed countries, more and more customers switch from soft drinks to healthier alternatives such as juices, teas, coffee, water mixers etc. Developing nations, on the other hand, offer tremendous potential in terms of volume growth of soft drink. In addition, SFL helped them generate their market base understanding in terms of revenue share, which assisted to plan their strategic plan for coming years.

 

[1] https://www.cmtc.com/blog/food-and-beverage-manufacturing-trends-and-challenges-2016

 

Document Classification: Automated Legal Patent Rejection

Business Challenge

As the number of patent applications continue to rise over the years, so do the number of patent claims. The patent holder gains a significant competitive advantage with a successful patent registration. Firms receive hundreds of claims from patent holder asserting their ownership over the patent has been infringed. Reading and processing these documents takes upwards of tens of thousands of potential billable hours. It is in the best interest of firms to pursue the claims that are likely to result in a profit and set aside those that will not.

 

SFL's Approach

SFL-Scientific’s approach was to process the intellectual property claims and deliver them ranked to the firm. The claims were converted into a digital format through the method of optical character recognition. Inconsistencies and erroneous conversions are corrected manually before the text is analyzed. Using certain methods such as the bag-of-words model, term frequency-inverse document frequency and n-grams, the words in the documents were classified based on their textual features. Certain types of intellectual property patents experience a disproportionate amount of litigation and these algorithms can identify them. The machine learning model is created on textual features by training it on previously filed patents with known outcomes. This model provides a highly accurate prediction for the probability of a success of any particular claim.

 

Business Value

The business value is substantial for the firm. The need for lawyers to read and process the intellectual property claims is reduced if not entirely eliminated. The firm gain tens of thousands of potential billable hours for their lawyers. This sorted ranking of claims provides a platform for making strategic decisions on which ones to take on. Favorable claims appear higher in the list, whereas claims with a ranking below a certain threshold are automatically rejected. The metrics for deciding the success of a claim are built upon thousands of previous claims and highly advanced algorithms, with the probability of a claims success is considerably more accurate than a manual reading of the document.

 

Energy Usage Disaggregation

Business Challenge

Many companies take in and use data which contains information that it is not practical for a human to extract. A company may gather data to use for billing purposes but also possess the capability to learn a great deal about their customer from proper analysis of this data. Extracting this information and the value contained within can allow companies to better understand their customers and their behavior.  The benefits are threefold - 1) informs households of which appliances are using the most energy 2) personalised feedback to quantify savings for appliance-specific advice and 3) provides capability to build recommender systems to inform households of savings that can be achieved. In this particular case, SFL worked on data from a national European energy provider to expand their offerings.

 

SFL's Approach

SFL works to extract as much information as possible from clients’ data. For this European energy provider, we accomplished this by implementing algorithms to disaggregate their data. Disaggregation allows a company to convert a time series of power data into a broken down list of when each individual appliance (whether it is a refrigerator, toaster, or heater) was turned on/off. The algorithms that were used took raw time-series data and trained variants of Hidden Markov Models (HMM) known as combinatorial HMMs. This allowed for a model to be built that separated, per customer, the underlying behaviour of appliances. This increased resolution was always present in the data but SFL’s expertise was able to unlock it.

 

Business Value

Maximizing the value of existing data is always important for a company. Knowing more about customer behavior lets companies make more accurate and responsive decisions, and reach out to specific customers in targeted ways. In this case, the data was used to identify faulty devices using too much power and design more targeted programs aimed at reducing energy usage. It can provide further feedback to households to encourage cleaner appliances by stating expected savings and also targeted advertisements for new energy-efficient appliances. This kind of analysis can be taken further, with automated feedback, diagnostics, and repair servicing, that can be provided to the customer for greater visibility into the energy expenditure. On a larger scale, these types of data allows for a more accurate modelling of energy usage across the country and can allow better energy generation and storage optimizations.

Automated Classification of Images

Business Challenge

A huge number of high quality photographs are being taken in the smartphone era. For many companies there is value locked within these images but the vast number of such images makes them tedious to sort through. Identifying features and objects in these images automatically allows companies to gather information in new ways and while expending fewer resources. This can be used both with publicly available images and with user supplied images, allowing companies to get trained eye on something without actually paying a trained employee to look at it.

One of our clients who need help in this area was a construction company who wanted to use amateur photos of houses to learn information about the layout and dimension of houses without having to send crews to the site.

 

SFL's Approach

SFL-Scientific is able to combine advanced machine learning algorithms to suit our clients’ needs. Machine vision is a cutting edge field and there are many different algorithms each with strengths and weaknesses in different tasks. In this case we combined the types of algorithms used to create panorama images with convolutional neural nets for object recognition. This allowed us to create an accurate 3d model of a house using only photos of the outside.

 

Business Value

The ability to have a computer instead of a person extract relevant information from an image can save companies a great deal of money, especially if the necessary identification requires a great deal of domain knowledge. Trained humans are expensive but once you train them computers are cheap. In this case our algorithm allowed the construction company to forgo sending employees out to properties to collect basic information.