# Pair-Trading with S&P500 Companies - Part I.

In my recent post I wrote the code to download historical data for companies included in S&P500 index. Today I would like to perform statistical procedures to identify whether certain pair of stocks is co-integrated or not.

Since there are approximately 500 companies that means I will need to perform ${{500}\choose{2}} = 124 750$ calculations of testing. First of all I will divide my data into two parts: learning period and testing period. I will find co-integrated pairs in learning period and test the pair-trading strategy in testing period.

Secondly, I will use linear regression to calculate the spread (residuals) of the two corresponding stocks. To find out whether this two stocks are co-integrated or not I will perform the Augmented Dickey-Fuller Unit Root Test on spread to reject or not the hypothesis of stationarity. I will save the p-value of this test to the matrix. I will use the ADF test from fUnitRoots package.

And finally, I will apply this procedure on all the pairs. Uff.. that means a lot of computational time.. so take a break and have a cup of great coffee. See you in the next part of this tutorial.

Feel free to use the source code. Your valuable comments are more than welcome!

View Code RSPLUS
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51  rm(list = ls(all = TRUE))   library(quantmod) library(tseries) library(timeDate) library(fUnitRoots)     load(file = "/home/robo/workspace/R-Test/abcdeXXX.RData") ht <- matrix(data = NA, ncol = nrStocks, nrow = nrStocks) beta <- matrix(data = NA, ncol = nrStocks, nrow = nrStocks) sprd <- list()   z_old <- z; nDays <- length(z[,1])   # seting learning and testing periods testPeriod <- 250 learningPeriod <- 500   testDates <- (nDays-testPeriod):nDays learningDates <- (nDays - testPeriod - learningPeriod):(nDays - testPeriod)   data <- z z <- z[learningDates,] zTest <- data[testDates,]   # here we go! let's find the cointegrated pairs for (j in 1:(nrStocks-1)) { for (i in (j+1):nrStocks) {   cat("Calculating ", j, " - ", i , "\n") if (length(na.omit(z[, i])) == 0 || length(na.omit(z[, j])) == 0) { beta[j,i] <- NA ht[j,i] <- NA next }     m <- lm(z[, j] ~ z[, i] + 0) beta[j,i] <- coef(m)[1]   sprd <- resid(m)   ht[j,i] <- adfTest(na.omit(coredata(sprd)), type="nc")@test\$p.value   } }   #save(list = ls(all=TRUE), file = "/home/robo/Dropbox/work/FX/PairTrading/cointeg.RData") #######################################################################################

1. Hi,

how do you prepare the data series?, could you give an example on how the csv files has to be formated to run the scipt?.

Regards.

1. Could you share the formatting for the data required for this code?

Maybe even a small sample?

Thanks a lot!

2. Hey you,

I appreciate your nice work here. But reading it there are too questions coming up:

(1) Why do you choose to do this:
m <- lm(z[, j] ~ z[, i] + 0) instead of m <- lm(z[, j] ~ z[, i] )

(2) As I am a R newbie: What means the [1] in this expression: beta[j,i] <- coef(m)[1]

Thx and all the best
Max

3. Hi,

Could you please tell me how I need to change the line 9 (load(file = "/home/robo/workspace/R-Test/abcdeXXX.RData")) to make your code run on my machine ?

I changed it to load(file = "c:\\abcdeXXX.RData") but I am getting error msg " Error in if (!grepl("RD[AX]2\n", magic)) { : argument is of length zero " .

Thanks !

4. You can try the optimized versions of R available at http://www.revolutionanalytics.com/

Much faster computation

1. Hi Mariachi,

thanks for your comment. I do my research in Linux - Ubuntu and the community version of Revolution R is available only for Mac and Win. But on the other hand I can at least try it under Win and see if I can get the results faster.

2. 5 years of data, takes about 15-20 mins.Thanks..keep doing this stuff you may stumble with the most amazing stuff