Saturday, 14 October 2017

Using Google Trend to Predict Stock Market Movement

Conventionally, financial and economic data such as Revenue, Profit, Debt, Return on Equity (ROE), Gross Domestic Product (GDP), Inflation etc., are analysed statistically to predict the stock market behaviour.  Lately, some traders are using Google Trend data to predict the stock market movement (Read more here).  Google Trend is a public web facility of Google Inc., based on Google Search, that shows how often a search-term is entered relative to the total search-volume across various regions of the world, and in various languages (Read more here).

One blogger noted that whenever celebrity Anne Hathaway’s name was mentioned in news, Warren Buffett’s Berkshire Hathaway shares rose (Read more here).  Is this purely coincidence?  Or some automated robot trading programs are taking place?

Let’s do a simple experiment, using Google Trend data to predict Kuala Lumpur Composite Index (KLCI).  Three search-terms – “Malaysia”, “1MDB”, and “KLCI” were selected.  The popularity of each search-term over time were plotted together with KLCI.  Chart 1 is search-term “Malaysia” and KLCI; Chart 2 is search-term “1MDB” and KLCI; while Chart 3 is search-term “KLCI” and KLCI.











Based on simple eye-balling inspection, Chart 1 and Chart 2 did not reveal any strong relationship between search-term and KLCI movement.  Although there was a sharp drop in KLCI when the popularity of “1MDB” surged in Aug 2015, the subsequent surge did not move KLCI drastically.  Chart 3, on the other hand, is more interesting as each time the popularity of search-term “KLCI” peaked, the KLCI tend to reverse its downtrend movement. 

Next, these data were analysed using basic machine learning algorithm.  Generally, there are two main types of machine learning used in quantitative finance – Regression, and Classification.  For simplicity purpose, Classification method is chosen for this analysis (Read more here).

The KLCI data were transformed into “Up”, “Down”, “Flat”, and “Dunno” by calculating the weekly closing price changes.  Example, if week 2 closing price is higher than week 1 closing price, week 2 will be classified as “Up”.  The “Down”, and “Flat” were calculated using similar said concept.  Additionally, the “Dunno” category was introduced to eliminate noises for the region where no high search popularity happened.

A time lag effect was also introduced into the model to “predict” whether KLCI will be “Up”, “Down”, “Flat”, or “Dunno” in the coming week.  As such, this week search-term results will affect next week KLCI behaviour.

Several algorithms were tested and k-nearest neighbours (KNN) algorithm was chosen as the accuracy is the highest among others.  See Picture 1 and Picture 2 for details.


Picture 1: Algorithm Comparison


 Picture 2: KNN Confusion Matrix and Accuracy Score


Now, let’s run a hypothetical test case to predict KLCI movement.  In hypothetical 1, assuming the search-term popularity for “Malaysia”, “1MDB”, and “KLCI” are 2, 1, and 25 respectively.  This means “Malaysia” and “1MDB” search traffics are almost flat but “KLCI” search traffic increase by 25%.  The KNN algorithm predicted the KLCI will go down in the following week.  In hypothetical 5, both “Malaysia and “1MDB” are almost flat but “KLCI” retreated from high peak.  The KNN algorithm predicated the KLCI will go up in the coming week.  The machine learning algorithm is giving similar results as eye-balling observation.  Table 1 shows KLCI movement predicted by KNN algorithm based on various hypothetical scenarios.

Table 1.

Above are just an illustrative example of how Google Trend and machine learning algorithm work.  Actual algorithm trading requires more intensive research and data processing effort!