Engineering2Finance: December 2018

Friday, 21 December 2018

Probit vs Logit

On 19 December 2018, S&P500 dropped 1.54%, the next day (20 December 2018), KLCI dropped 0.31%. The linear and logit regression model published on 14 December 2018 (Read more here) predicted the KLCI would fall 0.37% and the chances of the drop are as high as 75%. This shows that the quantitative approach is indeed decent.

Besides logit model, one could also use a probit model to run similar analysis. In the logit model the log odds of the outcome is modelled as a linear combination of the predictor variables. Meanwhile, in the probit model, the inverse standard normal distribution of the probability is modelled as a linear combination of the predictors.

Chart 1 shows the probability plot for both logit and probit models. Both models should give similar results. The slight difference is logit model has fatter tail.

Chart 1

Table 1 is the summary of the probit regression with the estimated coefficients. The p-values show that the slope is significant but the intercept is not significant. However, the impact of the intercept to the estimated probability is about 0.5%, which is relatively small, and also the condition where X = 0 is not modelled in this setup.

Table 1

Friday, 14 December 2018

Logit Regression on Overnight S&P500 Performance Impact on KLCI

We often hear that overnight US stocks performance might have an impact on KLCI the next day. But how shall we quantify this? A simple linear regression could be used to estimate the KLCI performance based on overnight Wall Street results. However, the goodness-of-fit is usually poor.

Chart 1 and Table 1 shows the regression plot and ANOVA table for overnight S&P500 and next day’s KLCI index performance using daily closing data from November 2015 to December 2018. The R² is low at 0.1237. Nevertheless, the significant of the slope’s p-value suggests that there are positive correlation between S&P500 and KLCI.

Chart 1

Table 1

Let’s ask the next question – what is the probability of the KLCI to close positively or negatively, given the performance of overnight S&P500? To answer the question, we could use logistic regression (“logit”) to study the probability. Logistic regression is used in various fields, including machine learning (Read more here). It uses a logistic function to model a binary dependent variable. The logistic function is constructed based on linear regression model.

Linear regression model is

In logit model, Y value is labelled as “1” if KLCI gain on the next day or labelled as “0” if KLCI loss on the next day. X is the overnight S&P500 performance while b₀ and b₁ are the coefficients.

The probability of the function with given X value is

The coefficients are then estimated using Maximum Likelihood Estimation (MLE)

Table 2 shows the first 5 rows of the data and their respective equations while Table 3 is the summary of the logistic regression with the estimated coefficients. The p-values show that the slope is significant but the intercept is not significant. However, the impact of the intercept to the estimated probability is about 0.5%, which is relatively small, and also the condition where X = 0 is not modeled in this setup.

Table 2

Table 3

Now back to our question, what is the probability of the KLCI to close positively or negatively, given the performance of overnight S&P500? Chart 2 is the probability of KLCI Gain/Loss on next day given overnight S&P500 performance. The probability distribution shows that if overnight S&P500 gained 5%, it is almost 99% sure that KLCI will gain on the following day. If overnight S&P500 gained 1%, the chance for KLCI to gain on the next day is around 70%. What if overnight S&P500 loss 2%? Then the probability of KLCI to close positively on the following day would be around 15%.

Chart 2

Friday, 7 December 2018

"Qualitative" Regression On KLCI

Common linear regression is quantitative in nature. However, it could be used as qualitative measure under certain condition. For example, suppose we want to examine the seasonality of stocks return, we could estimate the regression model using “dummy” variables as independent variables. A “dummy” variable takes on a value of 1 if a particular condition is true and 0 if that condition is false.

Using the KLCI monthly closing return data from November 2011 to November 2018, we can estimate a regression including an intercept and 11 dummy variables, one for each of the first 11 months of the year. The equation that we estimate is

Return_t = b₀ + b₁Jan_t + b₂Feb_t + … + b₁₁Nov_t + e_t

where each monthly dummy variable has a value of 1 when the month occurs and a value of 0 for other months. The intercept b₀, measures the average return for KLCI in December because there is no dummy variable for December.

The following table shows the results of the regression.

The low R² suggests that a month-of-the-year effect in KLCI returns may not be very important for explaining KLCI returns. However, the significance of F-Test is below the conventional level of 5%, which indicates that we cannot reject the null hypothesis that all of the coefficients jointly are equal to 0. This means we could look at the seasonality effect on certain months that are statistically significant such as December (Intercept), May, June, August, September and November. Amongst those months, only December has positive average return while other months have negative average returns. Will the history repeat itself in December 2018, perhaps window dressing for this holiday season?

Reference:

CFA Program Level II Reading Assignment by Sanjiv R. Das, PhD, Richard A. DeFusco, PhD, CFA, Dennis W. Mcleavey, CFA, Jerald E. Pinto, PhD, CFA, and David E. Runkle, PhD, CFA