In my recent post I wrote the code to download historical data for companies included in S&P500 index. Today I would like to perform statistical procedures to identify whether certain pair of stocks is cointegrated or not.
Since there are approximately 500 companies that means I will need to perform calculations of testing. First of all I will divide my data into two parts: learning period and testing period. I will find cointegrated pairs in learning period and test the pairtrading strategy in testing period.
Secondly, I will use linear regression to calculate the spread (residuals) of the two corresponding stocks. To find out whether this two stocks are cointegrated or not I will perform the Augmented DickeyFuller Unit Root Test on spread to reject or not the hypothesis of stationarity. I will save the pvalue of this test to the matrix. I will use the ADF test from fUnitRoots package.
And finally, I will apply this procedure on all the pairs. Uff.. that means a lot of computational time.. so take a break and have a cup of great coffee. See you in the next part of this tutorial.
Feel free to use the source code. Your valuable comments are more than welcome!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51  rm(list = ls(all = TRUE)) library(quantmod) library(tseries) library(timeDate) library(fUnitRoots) load(file = "/home/robo/workspace/RTest/abcdeXXX.RData") ht < matrix(data = NA, ncol = nrStocks, nrow = nrStocks) beta < matrix(data = NA, ncol = nrStocks, nrow = nrStocks) sprd < list() z_old < z; nDays < length(z[,1]) # seting learning and testing periods testPeriod < 250 learningPeriod < 500 testDates < (nDaystestPeriod):nDays learningDates < (nDays  testPeriod  learningPeriod):(nDays  testPeriod) data < z z < z[learningDates,] zTest < data[testDates,] # here we go! let's find the cointegrated pairs for (j in 1:(nrStocks1)) { for (i in (j+1):nrStocks) { cat("Calculating ", j, "  ", i , "\n") if (length(na.omit(z[, i])) == 0  length(na.omit(z[, j])) == 0) { beta[j,i] < NA ht[j,i] < NA next } m < lm(z[, j] ~ z[, i] + 0) beta[j,i] < coef(m)[1] sprd < resid(m) ht[j,i] < adfTest(na.omit(coredata(sprd)), type="nc")@test$p.value } } #save(list = ls(all=TRUE), file = "/home/robo/Dropbox/work/FX/PairTrading/cointeg.RData") ####################################################################################### 
Tags: cointegration, cointegration, pair trading, pairtrading, R, statistical arbitrage

Hi,
how do you prepare the data series?, could you give an example on how the csv files has to be formated to run the scipt?.
Regards.

Could you share the formatting for the data required for this code?
Maybe even a small sample?
Thanks a lot!

Hey you,
I appreciate your nice work here. But reading it there are too questions coming up:
(1) Why do you choose to do this:
m < lm(z[, j] ~ z[, i] + 0) instead of m < lm(z[, j] ~ z[, i] )(2) As I am a R newbie: What means the [1] in this expression: beta[j,i] < coef(m)[1]
Thx and all the best
Max 
Hi,
Could you please tell me how I need to change the line 9 (load(file = "/home/robo/workspace/RTest/abcdeXXX.RData")) to make your code run on my machine ?
I changed it to load(file = "c:\\abcdeXXX.RData") but I am getting error msg " Error in if (!grepl("RD[AX]2\n", magic)) { : argument is of length zero " .
Thanks !

You can try the optimized versions of R available at http://www.revolutionanalytics.com/
Much faster computation

12 comments
Comments feed for this article
Trackback link: http://blog.quanttrader.org/2011/03/pairtradingwithsp500companiesparti/trackback/