Machine Learning in Finance

There are numerous ways in which Machine Learning can be applied to finance. In the last two blogs we gave a brief overview into how we could use supervised and reinforcement learning to give models that can manage and optimize market portfolios. 

In this blog, we will cover a broader spectrum of topics in Finance that relate to predicting trends and outliers on both the small and large scales. 

 

Commercial Growth using Machine Vision and Parking Lots

This is a simple, yet interesting example of how we can leverage machine vision methods from data science, to predict revenue and growth of commercial enterprises. 

In particular, let’s take a look at Bed Bath and Beyond (BB&B), which was in the news recently for being oversold (http://www.forbes.com/sites/dividendchannel/2016/12/28/bed-bath-beyond-is-now-oversold-bbby-3/#12a0334d6b89). Say we would like to determine whether or not BB&B have more or fewer customers this year - before the quarterly report is made public. 

Screenshot 2016-12-28 10.51.49.png

We can take satellite data - which we can purchase online, or through agencies - and using machine vision methods to determine how many vehicles there are in the parking lot.  A typical method that is considered state-of-the-art is the Convolutional Neural Net (described in our blog post https://sflscientific.com/blog/2015/12/4/image-recognition-getting-value-from-visual-data). 

By determining how full the parking lot is for BB&B country-wide over time, we can create a reasonable indicator of the amount of business they are getting and predict their quarterly profit margins. 

You can use this as another piece of information in your manual trading strategy or indeed as just one of many raw features that can be fed into a supervised Machine Learning methods to determine a good valuation of BB&B stock.  

Predictions using Weather Forecasting

817 to 823.jpg

In a similar manner, we can predict market movements of say of produce or power by weather patterns. This could be raw satellite imagery of farmlands or indeed the rich information available of historic and predicted weather measurements [some repositories are freely available, with good APIs to get the data e.g. https://www.programmableweb.com /api/noaa-national-weather-service-nw]. 

From here, it’s a simple manner of training a supervised learning algorithm on these features by coupling them with historic trading information of stocks or commodity prices. Then, with the (free or custom built) weather forecasts out there, we can use our models to predict market trends.

The exact algorithm will very much depend on your dataset, but they all operate in a similar manner. We find raw information - which are called features -  that hopefully has some correlation to what we are looking to predict and feed it into an algorithm with the known answer (i.e. whether or not the stock price rises or falls). As the algorithm ingests the data, the model is “trained”. Once we have given the algorithm all historic information we have, the fully trained model can then be applied to tomorrow’s data to make a good prediction of the future. 

Interestingly, you can apply weather information and forecasting to all manner of problems. For example, let’s say you want to know how many people will likely show up at the bank or store today, or the quantity of people looking for a notary. Weather data may drive a lot of your customer behaviours without you even knowing it. Making up an example,  perhaps when it rains it is 95% likely that zero customers will show up today to sign up for some sort of insurance [perhaps since most people put this off till a nicer day etc], then by using weather forecasting and machine learning we can help you optimally schedule your staff. 

Anomaly Detection to Detect Fraud

Anomaly detection is a standard arena for data science. Simple local outlier factors, and density based methods (using clustering analyses) are simple and effective to use with unlabelled data, we have actually discussed them previously so will only get a brief mention.  For time-series data, more complicated DTW methods (described https://sflscientific.com/blog/2016/6/3/dynamic-time-warping-time-series-analysis-ii) or even Seasonal Hybrid ESD, which basically removes seasonal fluctuations and looks for large deviations from the mean, can also be used for determining anomalous time-series data in which you have some known anomalous data. 

Whether it be credit card spending, insurance fraud or any other example of anomaly detection, where you are locating a small portion of anomalies out of 99.99% of normal cases, anomaly detection can help you do that. 

Screenshot 2016-12-28 11.22.28.png

For the simplest cases even without time-series information, what we can do is to first normalise data about spending habits, income levels, credit history, mortgage rate etc and throw it into an algorithm such as Local Outlier Factor or even kMeans. The algorithm will typically use some notion of distance, be in a cosine similarity score or other metric, to find the users/clients/data points that are most dissimilar from the rest.

From the potentially millions of customers you now need to investigate manually, you can set a threshold to look at only the most extreme anomalies found by the algorithm. For a more detailed look at anomaly detection, see https://sflscientific.com/case-studies/2016/7/30/anomaly-detection-network-intrusion-detector.

Market Forecasting with NLP Sentiment Analysis

Finally, we’ll discuss market forecasting with sentiment analysis.  We have briefly covered the topic of sentiment analysis previously (https://sflscientific.com/blog/2016/11/17/text-summarization-in-natural-language-processing), so we will simply extend that notion into market forecasts. With the huge influx of online news and social media platforms, it is relatively straightforward to harness the sentiment of the internet, to predict market trends.

We merely construct one of the sentiment models we previously described, and correlate it with a machine learning model to market movements. Since social news such as twitter can uncover early trends and breaking news, it can give you an edge against the market.

A high-profile example would be the VW gas mileage scandal or even country-wide issues such as Brexit and the US Presidential election. The first had major impact on VWs stock valuation and the latter two on country-wide markets and exchange rates. 

With social media picking this up and spreading the information with such velocity, a good sentiment analysis algorithm would be able to identify the emotions related to VW or Brexit  and make a decision on what to do with any stock or currency you may have. The model itself would always be monitoring current mood on all companies you have an interest in, to let you beat the competition. 

Final Thoughts

We covered four simple topics where data science can be applied to the world of Finance: machine vision, weather forecasts, anomaly detection and NLP all can be used to help extract the value from data.

There are countless avenues of exploring Finance with data science. From predicting customer attrition and value at banks (using typical market analyses) to scheduling optimisations by predicting no-shows to appointments, Finance is a highly lucrative and varied field in which machine learning can provide significant insight.