Conventionally, financial and economic data such as Revenue,
Profit, Debt, Return on Equity (ROE), Gross Domestic Product (GDP), Inflation
etc., are analysed statistically to predict the stock market behaviour. Lately, some traders are using Google Trend
data to predict the stock market movement (Read
more here). Google Trend is a public
web facility of Google Inc., based on Google Search, that shows how often a search-term
is entered relative to the total search-volume across various regions of the
world, and in various languages (Read more here).
One blogger noted that whenever celebrity Anne Hathaway’s
name was mentioned in news, Warren Buffett’s Berkshire Hathaway shares rose (Read more here). Is this purely coincidence? Or some automated robot trading programs are
taking place?
Let’s do a simple experiment, using Google Trend data to
predict Kuala Lumpur Composite Index (KLCI).
Three search-terms – “Malaysia”, “1MDB”, and “KLCI” were selected. The popularity of each search-term over time were
plotted together with KLCI. Chart 1 is search-term
“Malaysia” and KLCI; Chart 2 is search-term “1MDB” and KLCI; while Chart 3 is search-term
“KLCI” and KLCI.
Based on simple eye-balling inspection, Chart 1 and Chart
2 did not reveal any strong relationship between search-term and KLCI movement. Although there was a sharp drop in KLCI when
the popularity of “1MDB” surged in Aug 2015, the subsequent surge did not move
KLCI drastically. Chart 3, on the other
hand, is more interesting as each time the popularity of search-term “KLCI”
peaked, the KLCI tend to reverse its downtrend movement.
Next, these data were analysed using basic machine learning
algorithm. Generally, there are two main
types of machine learning used in quantitative finance – Regression, and
Classification. For simplicity purpose,
Classification method is chosen for this analysis (Read
more here).
The KLCI data were transformed into “Up”, “Down”, “Flat”,
and “Dunno” by calculating the weekly closing price changes. Example, if week 2 closing price is higher
than week 1 closing price, week 2 will be classified as “Up”. The “Down”, and “Flat” were calculated using
similar said concept. Additionally, the “Dunno”
category was introduced to eliminate noises for the region where no high search
popularity happened.
A time lag effect was also introduced into the model to “predict”
whether KLCI will be “Up”, “Down”, “Flat”, or “Dunno” in the coming week. As such, this week search-term results will
affect next week KLCI behaviour.
Several algorithms were tested and k-nearest neighbours
(KNN) algorithm was chosen as the accuracy is the highest among others. See Picture 1 and Picture 2 for details.
Picture 1: Algorithm Comparison
Now, let’s run a hypothetical test case to predict KLCI
movement. In hypothetical 1, assuming
the search-term popularity for “Malaysia”, “1MDB”, and “KLCI” are 2, 1, and 25
respectively. This means “Malaysia” and “1MDB”
search traffics are almost flat but “KLCI” search traffic increase by 25%. The KNN algorithm predicted the KLCI will go
down in the following week. In hypothetical
5, both “Malaysia and “1MDB” are almost flat but “KLCI” retreated from high
peak. The KNN algorithm predicated the
KLCI will go up in the coming week. The
machine learning algorithm is giving similar results as eye-balling observation. Table 1 shows KLCI movement predicted by KNN algorithm
based on various hypothetical scenarios.
Table 1.
Above are just an illustrative example of how Google Trend
and machine learning algorithm work.
Actual algorithm trading requires more intensive research and data processing
effort!
No comments:
Post a Comment