BSE Sensex vs. BSE IT Index: A Comparative Time Series Analysis

Before undertaking in-depth data analysis, understanding the basic statistical characteristics of time series data is crucial. Figure 1 illustrates the daily returns of the S&P BSE Sensex and S&P BSE IT indices. Statistical analysis was performed using Eviews 9.5 software, adhering to each stage of the ARIMA process. Figure 1 plots the returns on the y-axis against the years on the x-axis, with the period from 2007 to 2017 represented as years 1 to 18.

The descriptive statistics for both the S&P BSE Sensex and S&P BSE IT are summarized in Table 1. As shown, the mean returns for both indices are positive but close to zero, suggesting a tendency towards regression in the long run. The range between minimum and maximum values is wider for S&P BSE Sensex returns (0.1198) compared to S&P BSE IT returns (0.0979). The standard deviation, a measure of volatility, is 0.6% for the S&P BSE Sensex and slightly higher at 0.7% for the S&P BSE IT, indicating relatively high volatility for both BSE indices during the examined period. In terms of skewness, the Sensex exhibits positive skewness (0.159), indicating a symmetric tail, while the S&P BSE IT shows negative skewness (− 0.145181), representing an asymmetric tail. This asymmetry in S&P BSE IT suggests a higher likelihood of profits from returns accompanied by elevated risk, given that its skewness value exceeds the mean return value. The kurtosis values for S&P BSE Sensex and S&P BSE IT are 13.23763 and 8.493113, respectively. Both values are significantly greater than 3 (the standard normal distribution), which points to a sharp peak and fat tail distribution for both BSE indices, deviating from a normal distribution. The Jarque-Bera test values further support this non-normality, with 11,907.32 for S&P BSE Sensex and 3434.352 for S&P BSE IT, both substantially exceeding the standard normal distribution threshold (5.8825). Consequently, the null hypothesis of normal distribution is rejected at the 5% level for both indices.

Variance Ratio Test: Comparing Predictability

The Variance Ratio (VR) test, notably developed by Lo and MacKinlay (1988, 1989), is a widely used method for assessing the predictability of asset prices in time series data. It operates by comparing return variances across different time intervals. Under the assumption that data follows a random walk, the variance over a period should be proportional to the variance of a single period difference (Tabak 2003). This analysis employs rank, rank-score, and sign-based variations of the Lo & MacKinlay and Kim VR tests to ascertain statistical significance. The Lo and MacKinlay (1988, 1989) VR test can be applied to both homoscedastic and heteroscedastic random walks, utilizing asymptotic normal or wild bootstrap probabilities (Kim 2006). Furthermore, tests incorporating rank, rank-score, and sign-based forms (Wright 2000) have been evaluated with bootstrap methods for statistical validation. Additionally, Wald and multiple comparison VR tests (Richardson & Smith 1991; Chow and Denning 1993) were conducted across several intervals, assuming a random walk series for testing the data.

The analysis utilizes seven series (S1-S7) representing different VR test specifications:

  • S1: Lo and MacKinlay (1988) homoskedasticity, no bias correction, and random walk series.
  • S2: Lo and MacKinlay (1988) Heteroskedasticity, martingale series.
  • S3: Wright (2000) rank and random walk series.
  • S4: Rank score and random walk series.
  • S5: Sign-based test and martingale series.
  • S6: Kim (2006), homoskedasticity and random walk series (1000 replications).
  • S7: Kim (2006), Heteroskedasticity and random walk series (1000 replications).

Holding periods of 2, 4, 8, and 16 days, considered short-term (Deo & Richardson 2003), were used for the VR tests. Table 2 presents the results of standard (Lo and MacKinlay 1988, 1989), non-parametric (Wright 2000), multiple VR tests (Chow and Denning 1993), and a modified multiple VR test version (Belaire-Franc & Contreras 2004). The multiple VR tests in column 3 of Table 2 demonstrate that all tests reject the null hypothesis of a random walk or martingale for returns of both indices. Columns 4, 5, and 6 display the Z-Statistic, VR, and p-values for holding periods of 2, 4, 8, and 16 days for individual tests. These results consistently reject the null hypothesis at the 1% significance level. Therefore, Table 2 indicates that returns for both S&P BSE Sensex and S&P BSE IT are strongly predictable based on past prices, suggesting that neither index operates efficiently in the market. This conclusion aligns with findings by Rapach et al. (2013), who also rejected the weak form of market efficiency using similar methodologies.

Applying the ARIMA Methodology for Forecasting

The application of the Autoregressive Integrated Moving Average (ARIMA) methodology involves two primary phases: first, developing the ARIMA model, and second, validating the model’s predictive accuracy against actual data from a two-year holdback period (January 1, 2015, to December 31, 2017). This two-year period is deemed appropriate in existing literature for robust validation of predictions. Additionally, diagnostic tests and parameter significance tests were conducted to verify if the residuals exhibit white noise characteristics.

Determining ARIMA Model Parameters with Correlograms

Correlograms, which analyze Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF), are essential for identifying the appropriate parameters (p, d, q) for ARIMA models. The ACF measures the correlation of current first-differenced returns with lags up to 12 periods, while the PACF measures the correlation between observations and intermediate lags. Utilizing the Box-Jenkins methodology, ACF and PACF help determine the ARMA model type and the optimal values for p (autoregressive order) and q (moving average order). The ACF is calculated using the formula:

$$ {hat{rho}}_k=frac{gamma_k}{gamma_0} $$

Where:

  • (hat{rho}_k ) is the ACF for lag k,
  • *γ*k is the covariance at lag k, and
  • γ0 is the sample variance.

Figure 2 displays the correlograms for S&P BSE Sensex and S&P BSE IT first-degree returns, including AC, PAC, Q-stat, and probability statistics across 12 lags. The significance of each AC coefficient is tested using standard error calculations. Dotted lines in the figure represent the error bounds for AC and PAC, calculated using the formula:

$$ hat{rho}sim pm 2/left(sqrt{T}right) $$

Figure 2 indicates that several correlations are statistically significant based on the standard error correlation coefficient formula. With a sample size (n) of 2724, the standard error is calculated as ( sqrt{1/n} ) = ( sqrt{1/2724} ) ≈ 0.01916. The 95% confidence interval for ( {hat{rho}}_k ) is approximately 0 ± 1.98084 * 0.01916, or (− 0.037953 to 0.037953). Correlation coefficients falling outside this range are considered statistically significant at the 5% level. For S&P BSE Sensex, ACF and PACF correlations at lags 1, 2, 6, and 8 are statistically significant, suggesting AR and MA terms of orders 1, 2, 6, and 8. For S&P BSE IT, significant correlations are observed at lags 1, 2, and 5, indicating AR and MA terms of orders 1, 2, and 5.

Unit Root Tests for Stationarity

Unit root tests are performed to assess the stationarity of the time series. In this analysis, the Augmented Dickey-Fuller (ADF), Phillips-Perron (PP), and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) tests were employed. For all three tests, the null hypothesis that the stock returns series contains a unit root was rejected at the 5% significance level, confirming stationarity for both S&P BSE Sensex and S&P BSE IT returns.

Table 3 summarizes the findings for three test equations:

  • TE1: Test equation with intercept.
  • TE2: Test equation with trend & intercept.
  • TE3: Test equation without intercept.

The results across these test equations consistently indicate stationarity for both S&P BSE Sensex and S&P BSE IT returns, implying the absence of long-term shocks in their return series.

ARIMA Model Estimation and Comparison

ARIMA models, combining Autoregressive (AR) and Moving Average (MA) components, were estimated using linear regression to determine the best-fit parameters. For S&P BSE Sensex, initial ARMA model estimation considered lags 1, 2, 6, and 8. For S&P BSE IT, lags 1, 2, and 5 were used. Estimation criteria for both indices are detailed in Appendix Table 7. In S&P BSE IT, AR and MA terms at lags 1, 2, and 5 were significant. However, for S&P BSE Sensex, the MA(8) term was not significant and was removed. The adjusted ARMA model for S&P BSE Sensex then considered AR and MA terms at lags 1, 2, and 6. Appendix Table 8 shows the ARMA estimation based on these adjusted terms. Figure 4 displays residuals from the least squares regression, indicating a random distribution.

Model selection was guided by the Akaike Information Criterion (AIC) and Schwarz Criterion (SC). Lower AIC and SC values indicate a better model fit. For S&P BSE Sensex, AIC values were − 7.019098 for the AR model and − 7.304545 for the MA model. SC values were − 7.008247 for AR and − 7.293694 for MA. For S&P BSE IT, AIC values were − 6.720785 for AR and − 7.038715 for MA; SC values were − 6.709934 for AR and − 7.027864 for MA. In both cases, MA models exhibited lower AIC and SC values compared to AR models, suggesting a better fit. Specifically, for S&P BSE Sensex, MA model terms 1 and 6 were selected. Table 4 presents AIC and SC values for different model combinations.

Maximum likelihood estimation, using methods like the Berndt–Hall–Hall–Hausman method for least squares, typically follows the AR term. For ARIMA models, likelihood functions are complex, but beneficial for assessing innovations or prediction errors. The combination of lags (1, 6) for S&P BSE Sensex and (1, 2) for S&P BSE IT yielded the best-fit ARMA models, as illustrated in Figure 3. Residuals from these best-fit models were tested using ADF, confirming the stationarity of the residual data.

Auto ARIMA Model Comparison

Auto ARIMA model estimation, utilizing AIC comparisons, was also performed to determine the optimal fit for time series forecasting. This method estimated 25 ARMA term combinations. Table 5 presents the estimated ARMA terms and corresponding AIC values from the Auto ARIMA model.

Forecasting and Validation of ARIMA Models

Once the ARMA models were fitted, they were used to forecast future returns using both static and dynamic forecasting methods. Static forecasting used actual present and lagged values, whereas dynamic forecasting used previously forecasted values. Utilizing the models from Figure 3, static and dynamic forecasting values are presented in Table 6. Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) were used to evaluate the forecasting model accuracy.

Table 6 shows RMSE and MAE values for S&P BSE Sensex and S&P BSE IT returns. These metrics, calculated from the errors between forecasted and actual data, indicate that the selected ARMA models provide reasonably accurate forecasts for the holdback period.

The validation phase, crucial for assessing predictive accuracy, was conducted using static forecasting within the ARIMA process. After model estimation, forecasted returns were compared against actual returns from January 1, 2015, to December 31, 2017. Figure 4 visually compares these actual and forecasted values.

In Figure 4(a), “SENSEX_RETF” (blue line) represents forecasted values, and “DSEN” (red dashed line) represents the first-degree values of S&P BSE Sensex returns. The close alignment of these lines indicates a strong correspondence between forecasted and actual values, with minor variations observed around May 2015, August 2015, and February 2016. These slight deviations correspond to the prediction errors reflected in RMSE (0.005) and MSE (0.004) in Table 6. Figure 4(b) shows a similar comparison for S&P BSE IT, with “IT_RETURNSF” (blue line) for forecasted IT returns and “DIT” (red dashed line) for first-degree IT returns. Again, the forecasted and actual values are closely aligned, with minor variations in July 2015, August 2015, July 2016, June 2017, and August 2017, correlating with RMSE (0.006) and MSE (0.005) in Table 6.

Key Findings: BSE Sensex and BSE IT Market Efficiency Compared

Descriptive statistics for S&P BSE Sensex and S&P BSE IT revealed near-zero positive mean returns, suggesting long-term regressive tendencies. The negative skewness in S&P BSE IT indicates a higher likelihood of high-risk, high-return scenarios. Jarque-Bera test results for both indices rejected the null hypothesis of normal distribution at the 5% level. Variance Ratio tests, including standard, non-parametric, multiple, and modified multiple versions, consistently rejected the random walk or martingale null hypothesis for both S&P BSE Sensex and S&P BSE IT returns. This strong rejection implies that returns for both indices are predictable based on historical prices. Consequently, the study concludes that neither S&P BSE Sensex nor S&P BSE IT exhibits evidence supporting the Efficient Market Hypothesis (EMH) in the long run. The findings suggest that past information is quickly incorporated into stock prices, indicating a semi-strong form of market efficiency for both indices, but predictability based on time series analysis remains evident.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *