In my recent post I wrote the code to download historical data for companies included in S&P500 index. Today I would like to perform statistical procedures to identify whether certain pair of stocks is co-integrated or not.
Since there are approximately 500 companies that means I will need to perform
calculations of testing. First of all I will divide my data into two parts: learning period and testing period. I will find co-integrated pairs in learning period and test the pair-trading strategy in testing period.
Secondly, I will use linear regression to calculate the spread (residuals) of the two corresponding stocks. To find out whether this two stocks are co-integrated or not I will perform the Augmented Dickey-Fuller Unit Root Test on spread to reject or not the hypothesis of stationarity. I will save the p-value of this test to the matrix. I will use the ADF test from fUnitRoots package.
And finally, I will apply this procedure on all the pairs. Uff.. that means a lot of computational time.. so take a break and have a cup of great coffee. See you in the next part of this tutorial.
Feel free to use the source code. Your valuable comments are more than welcome!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | rm(list = ls(all = TRUE)) library(quantmod) library(tseries) library(timeDate) library(fUnitRoots) load(file = "/home/robo/workspace/R-Test/abcdeXXX.RData") ht <- matrix(data = NA, ncol = nrStocks, nrow = nrStocks) beta <- matrix(data = NA, ncol = nrStocks, nrow = nrStocks) sprd <- list() z_old <- z; nDays <- length(z[,1]) # seting learning and testing periods testPeriod <- 250 learningPeriod <- 500 testDates <- (nDays-testPeriod):nDays learningDates <- (nDays - testPeriod - learningPeriod):(nDays - testPeriod) data <- z z <- z[learningDates,] zTest <- data[testDates,] # here we go! let's find the cointegrated pairs for (j in 1:(nrStocks-1)) { for (i in (j+1):nrStocks) { cat("Calculating ", j, " - ", i , "\n") if (length(na.omit(z[, i])) == 0 || length(na.omit(z[, j])) == 0) { beta[j,i] <- NA ht[j,i] <- NA next } m <- lm(z[, j] ~ z[, i] + 0) beta[j,i] <- coef(m)[1] sprd <- resid(m) ht[j,i] <- adfTest(na.omit(coredata(sprd)), type="nc")@test$p.value } } #save(list = ls(all=TRUE), file = "/home/robo/Dropbox/work/FX/PairTrading/cointeg.RData") ####################################################################################### |
Tags: co-integration, cointegration, pair trading, pair-trading, R, statistical arbitrage
-
Hey you,
I appreciate your nice work here. But reading it there are too questions coming up:
(1) Why do you choose to do this:
m <- lm(z[, j] ~ z[, i] + 0) instead of m <- lm(z[, j] ~ z[, i] )(2) As I am a R newbie: What means the [1] in this expression: beta[j,i] <- coef(m)[1]
Thx and all the best
Max -
Hi,
Could you please tell me how I need to change the line 9 (load(file = "/home/robo/workspace/R-Test/abcdeXXX.RData")) to make your code run on my machine ?
I changed it to load(file = "c:\\abcdeXXX.RData") but I am getting error msg " Error in if (!grepl("RD[AX]2\n", magic)) { : argument is of length zero " .
Thanks !
-
You can try the optimized versions of R available at http://www.revolutionanalytics.com/
Much faster computation



11 comments
Comments feed for this article
Trackback link: http://blog.quanttrader.org/2011/03/pair-trading-with-sp500-companies-part-i/trackback/