Pair-Trading in R - Update

I found amazing R package in one of posts on R-bloggers website. It's called RcppAmadillo and you can find more info here. The function I am using from this package is called fastLm. Whereas I am interested in special case of Ax = b problem where A and b are n x 1 vectors and x is only one-dimensional I ran the following test:

 

View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
require(rbenchmark)
require(RcppArmadillo)
 
x <- matrix(rnorm(1e6), ncol=1)
beta <- matrix(0.34, ncol=1)
y <- x %*% beta + rnorm(ncol(x), 0,5)
A <- data.frame(y, x)
 
benchmark(
lm.fit(x, y),
fastLm(x, y),
lm( y ~ x + 0),
lsfit(x, y), 
lm( y ~ . + 0, data=A),
columns=c("test", "replications", "elapsed", "relative"),
order="relative",
replications=10
)

 

And obtained this results:

test replications elapsed  relative
2            fastLm(x, y)           10   1.384  1.000000
1            lm.fit(x, y)           10   5.529  3.994942
4             lsfit(x, y)           10   5.900  4.263006
5 lm(y ~ . + 0, data = A)           10  20.842 15.059249
3           lm(y ~ x + 0)           10  23.637 17.078757

You can easily spot the difference and the reason why I am using fastLm from RcppAmadillo package. What I have added to this blog is new Page where you can find the example of 30 co-integrated pairs of companies from S&P 500 index. I am displaying plots from this post. If you have any recommendations what you would like to see or add to this page, feel free to contact me.

In order to update this page every day I was forced to automate the process of downloading data, finding co-integrated pairs, analyzing them and creating the corresponding plots. What I have used is Pyhon and the main script looks like this:

 

View Code PYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#!/usr/bin/python
 
#from subprocess import call
import subprocess
import time
 
start = time.time()
 
print "Running master R script"
subprocess.call(["R < /home/robo/Desktop/PairTrading/automat/master.R --no-save"], shell = True)
print "Running master R script - done!"
 
#, stdout = subprocess.PIPE)
 
print "Generating HTML.."
subprocess.call(["python htmlGenerator.py"], shell = True)
print "Done!"
 
end = time.time()
 
print "Elapsed time: " + str(end - start) + " s"
 
print "##################"
print "###  FINISHED  ###"
print "##################"
 
import testFTP.py

 

where I first run my R code, generate the HTML output in other Python file and finally upload the HTML output via FTP to the server. The main R code is as follows:

 

View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
# Download data
 
source("/home/robo/Desktop/PairTrading/downloadV2.R")
 
# Find co-integrated pairs
 
source("/home/robo/Desktop/PairTrading/cointegrationV2.R")
 
# Analyze data and export output file
 
source("/home/robo/Desktop/PairTrading/analysisV2.R")

 

Simple! This way I can easily maintain and extend my code. Your comments are more than welcome.

Tags: , ,

  1. Matthieu’s avatar

    If you are only interested in some precise outputs of the regression, then it might be even faster to use directly the qr().

    Similar as in your example, say one is interested only in the coefs, then qr.coef(qr(x),y) will be even faster than fastLm(x, y), see:

    test replications elapsed relative
    6 qr.coef(qr(x), y) 10 0.293 1.000000
    2 fastLm(x, y) 10 0.315 1.075085
    4 lsfit(x, y, intercept = FALSE) 10 1.150 3.924915
    1 lm.fit(x, y) 10 2.435 8.310580
    5 lm(y ~ . + 0, data = A) 10 10.551 36.010239
    3 lm(y ~ x + 0) 10 12.625 43.088737

    But if one wants to have multiple output from the regression (the residuals, the coefs, the fitted values, etc...) then fastLm(x, y) will be most likely the faster.

    PS: note you should use lsfit(x, y,intercept = FALSE)

    Reply

  2. Rohan’s avatar

    check out pi trading for cheap intra day historical data. it is ascii format though and cannot vouch for its quality. http://pitrading.com/intraday_ascii_data_stocks_edition.htm

    Reply

  3. Chris’s avatar

    Hey Quant Trader,

    Why are you doing your work and analysis at so extreme low frequencies ? The pairs on your "30 pairs from previous trading day" are mean reverting once a year. Not exactly cointegrated, and useless for investment purposes.

    I like your site. If you want to progress from here, try do the same analysis on cointegrating frequencies equal to 5-30 day mean reversion or if you want to spice it up, try 8 to 40 hour mean reversion. That you will be able to use for serious trading.

    Cheers

    Chris

    Reply

    1. QuantTrader’s avatar

      Chris,

      thank you for your insightful comment. I learned at school that I should use cointegration in situations where I investigate long lasting relationship between two time series. That's the reason why I was focused on low frequencies (the other one is because I use only end of the day data from yahoo and I don't think they are suitable for higher frequencies :) ).
      Do you have any research material about applying pair trading strategies on higher frequency data?

      Reply

    2. Shoonya’s avatar

      How do you intend to put in some kind of volatility filter?
      what about money management?

      Any screening conditions?

      Good work...

      Reply

      1. QuantTrader’s avatar

        Hi,

        do you mean some kind of simple or exponential moving average? At this moment I am not caring much about money management but will in the near future.
        Well, I can add some screening conditions. What do you propose?

        Reply

Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Notify me of followup comments via e-mail. You can also subscribe without commenting.