I found amazing R package in one of posts on Rbloggers website. It's called RcppAmadillo and you can find more info here. The function I am using from this package is called fastLm. Whereas I am interested in special case of Ax = b problem where A and b are n x 1 vectors and x is only onedimensional I ran the following test:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18  require(rbenchmark) require(RcppArmadillo) x < matrix(rnorm(1e6), ncol=1) beta < matrix(0.34, ncol=1) y < x %*% beta + rnorm(ncol(x), 0,5) A < data.frame(y, x) benchmark( lm.fit(x, y), fastLm(x, y), lm( y ~ x + 0), lsfit(x, y), lm( y ~ . + 0, data=A), columns=c("test", "replications", "elapsed", "relative"), order="relative", replications=10 ) 
And obtained this results:
test replications elapsed relative
2 fastLm(x, y) 10 1.384 1.000000
1 lm.fit(x, y) 10 5.529 3.994942
4 lsfit(x, y) 10 5.900 4.263006
5 lm(y ~ . + 0, data = A) 10 20.842 15.059249
3 lm(y ~ x + 0) 10 23.637 17.078757
You can easily spot the difference and the reason why I am using fastLm from RcppAmadillo package. What I have added to this blog is new Page where you can find the example of 30 cointegrated pairs of companies from S&P 500 index. I am displaying plots from this post. If you have any recommendations what you would like to see or add to this page, feel free to contact me.
In order to update this page every day I was forced to automate the process of downloading data, finding cointegrated pairs, analyzing them and creating the corresponding plots. What I have used is Pyhon and the main script looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27  #!/usr/bin/python #from subprocess import call import subprocess import time start = time.time() print "Running master R script" subprocess.call(["R < /home/robo/Desktop/PairTrading/automat/master.R nosave"], shell = True) print "Running master R script  done!" #, stdout = subprocess.PIPE) print "Generating HTML.." subprocess.call(["python htmlGenerator.py"], shell = True) print "Done!" end = time.time() print "Elapsed time: " + str(end  start) + " s" print "##################" print "### FINISHED ###" print "##################" import testFTP.py 
where I first run my R code, generate the HTML output in other Python file and finally upload the HTML output via FTP to the server. The main R code is as follows:
1 2 3 4 5 6 7 8 9 10 11  # Download data source("/home/robo/Desktop/PairTrading/downloadV2.R") # Find cointegrated pairs source("/home/robo/Desktop/PairTrading/cointegrationV2.R") # Analyze data and export output file source("/home/robo/Desktop/PairTrading/analysisV2.R") 
Simple! This way I can easily maintain and extend my code. Your comments are more than welcome.
Tags: pair trading, Python, R

If you are only interested in some precise outputs of the regression, then it might be even faster to use directly the qr().
Similar as in your example, say one is interested only in the coefs, then qr.coef(qr(x),y) will be even faster than fastLm(x, y), see:
test replications elapsed relative
6 qr.coef(qr(x), y) 10 0.293 1.000000
2 fastLm(x, y) 10 0.315 1.075085
4 lsfit(x, y, intercept = FALSE) 10 1.150 3.924915
1 lm.fit(x, y) 10 2.435 8.310580
5 lm(y ~ . + 0, data = A) 10 10.551 36.010239
3 lm(y ~ x + 0) 10 12.625 43.088737But if one wants to have multiple output from the regression (the residuals, the coefs, the fitted values, etc...) then fastLm(x, y) will be most likely the faster.
PS: note you should use lsfit(x, y,intercept = FALSE)

check out pi trading for cheap intra day historical data. it is ascii format though and cannot vouch for its quality. http://pitrading.com/intraday_ascii_data_stocks_edition.htm

Hey Quant Trader,
Why are you doing your work and analysis at so extreme low frequencies ? The pairs on your "30 pairs from previous trading day" are mean reverting once a year. Not exactly cointegrated, and useless for investment purposes.
I like your site. If you want to progress from here, try do the same analysis on cointegrating frequencies equal to 530 day mean reversion or if you want to spice it up, try 8 to 40 hour mean reversion. That you will be able to use for serious trading.
Cheers
Chris

How do you intend to put in some kind of volatility filter?
what about money management?Any screening conditions?
Good work...

6 comments
Comments feed for this article
Trackback link: http://blog.quanttrader.org/2011/05/pairtradinginrupdate/trackback/